542d23c60cf27e39fa941a5d

Mathl. Comput. Modelling Vol. 19, No. 1, pp. 7-19, 1994 Copyright@1994 Elsevier Science Ltd

Printed in Great Britain. All rights reserved

08957177(94)E0002-5 O&395-7177194~$6.00 + 0.00

Database Placement in Communication Networks for Minimizing the Overall

Transmission Cost

X.-M. LIN,* M. E. ORLOWSKA AND Y.-C. ZHANG Department of Computer Science, The University of Queensland

St. Lucia, QLD 4072, Australia {lxue,maria,yan}Ocs.uq.oz.au

(Received October 1993; accepted November 1993)

Abstract-The minimum spanning tree write policy for the maintenance of the consistency of a distributed database, where replicated data exist, has been proposed in [l]. In this paper, we first present a data placement heuristic algorithm in general networks for minimizing the overall transmission cost for processing the typical demands of queries (by a “simple” process strategy) and updates (by the minimum spanning tree write policy). Several interesting optimality estimation results of this algorithm are shown, while the computational intractability of the complete optimization, with respect to the simple strategy, is shown as well. Secondly, we apply a classical climbing hill technique to obtain a dynamic database placement algorithm based on an employed optimizer-a collection of distributed query process algorithms. This is guaranteed to output a “locally optimal” data allocation. The implementation results also show that those two heuristics work well in practice.

Keywords-Communication cost, Data placement, Distributed database, Network, Optimization.

1. INTRODUCTION

Data transmission cost study in a network usually includes the investigation of the path selection

problem [2]. In this paper, we assume that data communication between different sites travels

through the shortest path [2] which is described as follows.

Given a network, each link Zi has been assigned a maximum volume wi (i.e., link capacity) for

a packet movement, and each packet pays cost ci to cross the link. Here Q could be either the

time for a packet to traverse the link or the expense for a packet movement. Data required to

traverse between different sites must be first packed to several packets in order to cross links to

its destination. Further, each packet should have the size wi with respect to each link Zi, but

there is at most one whose size is possibly smaller than wi. In a heterogeneous communication

network, those ci and wi are not necessarily the same with respect to the links, and a repacking

may be required at a gateway (i.e., an interconnection between different local networks). Suppose

that data volume N is required to go from site a to site b in a network through a path p which

is connected by several links Zi (Zi E p). Thus, the total transmission cost for N to pass the path

can be expressed as

This can be approximately rewritten as

*Corresponding author.

7

X.-M. LIN et al.

The shortest path data communication is to choose a path p connecting a and b such that

c I,Ep ci/wi is minimized.

A communication network N can be represented as a weighted undirected graph [3] (V, E,p)

such that each vertex in V represents a site, each edge represents a two-way communication link,

and p is assigned so that for each edge ei, p(ei) = ci/wi. In this paper, we may assume that all

p(ei) are positive integers, since we can redefine the cost unit to make all ci/wi integers.

We present a database which is located (possibly replicated) among the sites in a communica-

tion network. One of the major aspects in distributed databases is to minimize the transmission

cost of data shipping between different sites for processing a transaction, since it has been decided

over last decade that the transmission cost is the dominant factor in processing a transaction in

a distributed database.

A number of significant results in processing a distributed query for minimizing transmis-

sion cost have been achieved in last decade [4-141. Meanwhile, processing a distributed update

throughout the minimum transmission cost can be found in [1,15,16].

Of cause, the following question has arisen in distributed database design. How can we find the

optimal data placement such that the overall transmission cost for processing typical demands of

transactions may be minimized? This data placement (data allocation) problem, also previously

called the file-allocation problem in distributed computing, has been extensively studied [15-211.

Achievements in the classical file-allocation problem ware surveyed in [19]. A simple application

of the file-allocation problem leads to a simple treatment (simple strategy) for processing a

distributed query. The simple treatment requires only that all referenced (by a query) relations

are sent to the result site, at which a query result is required, to finally perform the query at

the result site. Their works deal with processing a distributed query either through a uniform

communication network [17] (each pair of sites must be connected by a two-way communication

link, and all wi and ci are the same) or by the simple treatment [1,16,21].

The minimum spanning tree policy (see Section 2) for an update propogation has been in-

troduced recently in [1,16] for solving the data placement problem. However, they still use the

simple strategy for processing a query. Further, to achieve the optimal solution, only special

networks are considered.

To highlight the limited coverage shown in the previous works, we shall consider a general

network and address the data placement problem not only in respect to the simple treatment for

processing a query. We also use the minimum spanning tree policy for an update propogation.

These distinguish our work from all previous works.

Generally, each of the following factors will substantially impact on the quality and overall

performance of a data allocation:

1. How do we split a relation into several fragments?

2. What kind of strategy is used to process each transaction?

3. What kind of approach is used to find a data allocation, based on the solutions of 1 and 2, to minimize the overall communication costs?

There are a number of works on how to split a relation into fragments [22-261. In this paper,

we assume that a fragmentation, the set of fragments, has been automatically obtained by using the information about a set of most frequently used transactions [24].

The relationship between factor 2 and factor 3 makes the data allocation problem logically

intractable even if a fragmentation is given:

The quality of a data allocation can be evaluated through transaction processing strategies.

Different strategies may lead to different minimum overall transmission costs. On the other

Database Placement 9

hand, without specific data allocation, it is generally dificult to justify what strategies should be

chosen. This logical intractability is mainly caused by considering the inter-relationship between the

fragments (relations or objects) for processing a transaction. The following will be applied to solving the above logical intractability. We first adopt the

simple treatment for a query process, and the minimum spanning tree policy for an update

propagation, to develop a data allocation algorithm on a given database fragmentation. Then, we propose an approach to refine the data allocation obtained from this algorithm. This refinement

approach starts from an initial data allocation, and then iteratively and greedily reduces the

overall transmission cost by a classical hill climbing technique based on an employed optimizer.

Because current developments in query process optimization are far from perfect, our refinement

approach may allow the use of any query process optimization approach. The rest of the paper is organized as follows. In Section 2, we present some preliminaries which

include the necessary formalizations. As well, we present the motivations of our research in more

detail. In Section 3, a heuristic algorithm is presented for obtaining a data allocation under given

simple query strategies, together with its theoretical performance guarantee. In Section 4, we

present a dynamic approach for obtaining a data allocation in cooperation with an employed

optimizer. This is followed by the conclusions, remarks, and experiment reports.

2. PRELIMINARIES AND MOTIVATIONS

This section includes a background discussion with respect to networks, fragments, and trans-

action process strategies.

2.1. Networks

As described above, a communication network N can be represented as a weighted graph

(V, E, p) such that p assigns each edge ci E E with a positive integer number p(ei) which represents

the cost for a unit data volume to traverse this link. The transmission cost, as we mean in this

paper, of a data volume U shipping through a link e is U x p(e). Note that we can use (u, V) to

represent an edge where u and v are its two ends. A network N is metric if it is fully connected,

and for any triple u, U, zu of distinct sites:

P((T v)) 5 P((% w)) + P((W v)).

A communication network N = (V, E,p) h as a corresponding metric network fi = (V, I?,@)

such that # is constructed from N as follows:

. EC_&

l fi(e) = p(e) if e E E,

l Ij((u, v)) is defined to be the length of the shortest path connecting u and v in N if (u, v) $ E.

Note that we assume that the communication between two sites in a network is through the

shortest path. It follows that finding a data allocation with the minimum overall transmission cost on a network is equivalent to finding a such data allocation on its metric map.

In the rest of the paper, we consider only metric networks. “Metric network” is abbreviated

to “network.”

2.2. Fragmentations, Data Allocation, and Transactions

A primary fragmentation of a relational database is a set F = {fi : 1 5 i 5 m} of fragments

with the property that fi # fj if i # j. For the correctness issue of a fragmentation, we refer read- ers to [22,24-261. Allocation of a primary fragmentation may induce a duplicated fragmentation in which each fragment may have several copies.

10 X.-M. LIN et al.

For the remainder of this paper, an element in a primary fragmentation will always be called a

fragment, while each element in a duplicated fragmentation will be called a copy of a fragment.

Clearly, we need only to consider the allocation problem of a primary fragmentation. Thus,

“primary fragmentation” is abbreviated to “fragmentation” in the rest of the paper.

A data allocation (data placement) L of F on a network N = (V, E,p) is a mapping from F

to 2v. That is, for each fragment fi E F, L(fi) C V. A data allocation with respect to a

fragment fi is also called a data allocation (data placement) of fi.

Without loss of generality, in this paper we assume that the transactions are either query only

or update only, and transactions are expressed by fragments. For a transaction consisting of several queries and updates, we could view it as several transactions.

2.3. Transaction Processings

Suppose that F is a fragmentation, N = (V, E,p) is a network, L is a data allocation of F on N, and T is a transaction set. Further, let F = {fi : 1 5 i 5 m} and N = {j : 1 5 j 5 TZ}.

In this section, an overview is given of the current developments of techniques for processing a transaction in distributed databases.

Minimum spanning tree policy for an update

The consistency of a database should be maintained after an update. The following assumption

about the expression of an update may atomically maintain the consistency between different

fragments:

a transaction causing an update of several different fragments is always assumed to be

expressed in terms of those fragments.

Further, to maintain the consistency between different copies of a fragment, one needs to

propagate an update between those copies. Suppose that a user at node j of the network N

issues an update of a fragment fi. The following update strategy, MST-strategy, is applied to

update all copies of fi located on N:

l the route for processing this transaction is a minimum spanning tree of L(fi) U {j} in N.

For example, a given network, as illustrated in Figure 1, has the fragment fi located at site 1,

site 2, and site 4. A transaction issued at site 3 requires an update of fi. Using MST-strategy,

the update route is from site 3 to site 4, and then from site 4 to site 1 and site 2. In earlier papers

[17-191, the naive broadcasting policy (as called in [l]) was applied to update copies of the same

fragment. The naive policy asked the issue site to send write data content directly to every site

that has a copy of the fragment. In the above example, the route via the broadcasting strategy is

from site 3 to sites 1, 2, and 4. Usually, MST-strategy leads to a smaller transmission cost than

the naive broadcasting strategy, since the route of the naive policy is also a spanning tree.

Let Uij denote the total data volume required by the transactions in T issued at node j to

update the fragment fi. The overall transmission for updating j’i by MST-strategy is:

C uij vmsp(.h L(fi)).

j=l (1)

Here vmsp(j, L(fi)) is the summation of the weights of the links of the minimum spanning tree of {j} U L(fi) in N. The overall transmission cost to process the update type transactions of T with resnect to F and L is:

n ec uij vmsp(j, L(fi)). (2)


1 1 sill? 3

---9 - site4

Figure 1. MSP-strategy.

We should note that MST-strategy is optimal if we consider only the routes consisting of the

issuing site and the sites where fi is located. The optimal propagation policy for an update is

NP-hard if a propagation route allows to include any sites (minimum Steiner tree problem [27]).

In the rest of the paper, we shall use MST-strategy for an update process.

Distributed query process

According to a query processing optimization technique [28], we may always assume that

selection and projection operations (initial process) have been pushed down on the query tree

when processing a query; that is, it is necessary to first process the relevant selections and

projections on the required fragments at the sites where those fragments are located.

A simple strategy to process a query is to send the contents of all fragments, which are required

for access, to the query result site to perform the query. Meanwhile, if a fragment has several

copies in the allocation L, then the closest copy to the result site is chosen. For example, assume

that fr (A, B) and fz(B, C) are two fragments, and that fi is located at site 1 and fs is allocated at

site 2 and site 4 of the network illustrated in Figure 1. A query, resulting at site 3, is represented

in SQL [29] as follows:

SELECT A, B, C

FROM fi, f2

WHERE f1.B = f2.B and f1.A 5 50.

The simple strategy to process this query is to do a selection on fr at site 1 with the condition

f1.A 2 50, and then to send the result of the selection and the copy of fragment f2 at site 4 to

site 3 to implement the join.

Let Qij denote the total data volume of fi required to send to site j to process the queries

in T which resulted at site j by the simple strategy. Following this, the overall transmission cost

to process the queries in T is:

F 9 QijG, L(fi)), (3) i=l j=l

where d(j, L(fi)) = min{p((j, 1)) : 1 E L(fi)} (the shortest distance from j to L(fi) in N) if

j # L(fi), and d(j, L(fi)) = 0 if j E L(fi). Th e overall transmission cost for processing the queries in T with respect to fi is delined as:

2 Qij4.i, L(fi))- (4 j=l

There are a number of other ways developed to process a distributed query (especially to

process a join) for reducing the transmission cost with respect to the simple strategy, while the

wm 19:1-B

12 X.-M. LIN et al.

problem of minimizing the transmission cost is known to be NP-complete [9,30]. Those proposed

approaches can be classified into two classes.

One is called the semi-join reduction approach. This approach suggests to first implement a

semi-join on each referenced relation (fragment), and then send each semi-join reduced relation to

the result site to perform the join. Numerous algorithms [4,6-8,10,11,13,31] have been published

in the semi-join reduction approach, which are aimed at producing the optimal execution plan

for a distributed join. The other is called the join-based approach. This approach suggests using the join operation

only to reduce transmission cost. That is, instead of sending each referenced relation directly to

the result site, one may check each group of several fragments (relations) to see whether or not

the operation of implementing the join on them at some site and then sending the result to the

result site to perform the final join will save the transmission cost. Several papers [9,14,32] have

addressed this approach. Recently, a combination of these two approaches has been studied [7,33,34].

2.4. Motivations

Our motivations for the research are based on observations as follows:

Observation 1. The coverage of the previous works on data allocation has some limitation as pointed out in Section 1. Also, those works are concentrated on a special case in which

they are interested. This leads to difficulties for a generalization.

Observation 2. An obtained execution plan (better than the simple strategy) for a distributed query process usually should depend on a specific data allocation.

Observation 3. The overall transmission cost of a data allocation also depends on the employed strategies for processing transactions.

Thus, in this paper we will put our emphasis on a general network to resolve the problem in

Observation 1. We use a refinement algorithm to improve the quality of data allocation where

only the simple strategy has been addressed to resolve the problems in Observations 2 and 3.

3. A DATA ALLOCATION ALGORITHM UNDER THE SIMPLE STRATEGY

Suppose that F = {fi : 1 5 i < m} is a fragmentation, N = (V, E,p) is a network with the

node set V = {j : 1 5 j 5 n}, and T is a transaction set. In this section, we investigate the

problem of finding a data allocation L of F on N so that the overall transmission cost to process

the transactions in T by the simple query strategy and MST-strategy is minimized.

Let Uij denote the total data volume required by the transactions in T which are issued at

node j to update the fragment fi, and let Qij denote the total data volume of fi required to send

to the result site j to process the queries in T by the simple strategy. The above optimization

problem can be precisely stated as the following problem.

Simple data allocation problem (SDAP)

Find an allocation L of F on N so that the following value is minimized:

$J 2 Qijd(j, L(fi)) + F F uij vmsp(j, L(fi)). i=l j=l i=lj=1

(5)

The SDAP is equivalent to the problem of finding an allocation L of each fi so that the following

value is minimized: Tl n

c Q&j, -Wd) + 1 uij vmw(A L(h)). j=l j=l

(6)


Thus, to investigate SDAP, we need only to investigate the following problem.

Simple data allocation problem of one fragment (SDAPOF)

Find an allocation L for a given fragment fi so that (6) is minimized.

In the appendix, we will show that both SDAP and SDAPOF are NP-hard.

THEOREM 1. Both SDAP and SDAPOF are NP-hard.

In the following, we present an effective heuristic algorithm, SIMPLE, to solve SDAPOF for

each fragment fi. Then we combine all the produced data allocations on each individual fragment

to obtain an approximation solution of SDAP.

3.1. Algorithm SIMPLE

Since SDAPOF is NP-hard, it is unlikely for us to find a polynomial time algorithm to solve

it [27]. In this section, we present an approximation algorithm, SIMPLE, for SDAPOF. We first

characterize some necessary conditions for the optimal solution of SDAPOF to reduce the search

space. For 1 5 i 5 m, let Ui = C,“=, Uij; and for 1 5 i 5 m and 1 5 j 5 n, let Bij = Qij +

Uij - Vi. Here Vi denotes the total update data volume on fi required by T, and Bij provides a

beneficial measurement at site j. The total transmission cost with respect to a fragment fi and

a placement L of fi to process the transactions in T by the simple strategy and MST-strategy is:

cost(L(fi)) = f: Q&j, L(fi)) + 2 uij vmsp(j, L(fi)). j=l j=l

The following lemma says that we should always locate a copy of fi at the site in the optimal

allocation if the beneficial measurement at the site is positive.

LEMMA 1. Given a fragment fi, suppose that L is an arbitrary allocation of fragment fi. Then

cost(Ll(fi)) 5 cost(L(fi)), where Ll(fi) = L(fi) U {j} for a site j with Bij 2 0.

PROOF. Without loss of generality, we may assume that j $! L(fi). We may immediately verify the following properties according to the definitions:

d(j,K) Id(j,%) if V, G V,; (7)

vmsp(k V, U {j}) i 4.i VI) + vmw(k K); (8)

vmsp(lc, VI U {j}) = vmsp(j, VI) if j = Ic; (9)

where VI and V2 are subsets of the node set of the network. From the above properties, it follows that:

cOst(L(fi)) 5 C Qikd(k, Ll(fi)) + Qijd(j, L(fi))) Wmh 0.)

+ 2 Uik vmsp(k L(fi)), k=l

cOst(Ll(.fi)) 5 c Qi/cd(k, Ll(.fi)) f uij vmsP(j, L(fi)) W-h(f.)

+ 1 Uij(vmsP(k, L(fi)) + 4j, Wi)). k#j

(10)

(11)

14 X.-M. LIN et al.

Take (10) from (11):

cost&(fi)) - CO&qfi)) I -Bijd(j, L(fi)) I 0. I

From the proof of Lemma 1, we have the following corollary.

COROLLARY 1. Given a fragment fi, suppose that L is an arbitrary data allocation. Then

cost(Ll(fi)) < cost(L(fi)), where Ll(fd = L(fd U {j} f or a node j with B, > 0 and j $4 L(fi).

From Corollary 1, it follows that an allocation L of fi with the minimum value of cost(L(fi))

must allocate fi to those nodes j with Bij > 0. As well, it follows from Lemma 1 that we may

assume that an allocation L with the minimum value of cost(L(fi)) always allocates fi to those

nodes j with Bij = 0 in the case that Vi > 0. Thus, an allocation L of fi with the minimum overall communication cost cost(L(fi)) can be viewed as an extension of the data allocation which

allocates fi to those nodes j with B, 2 0. This is the basic idea for the development of the algorithm SIMPLE.

The algorithm SIMPLE starts to allocate copies of fi to those nodes j with Bij 2 0. Then it

iteratively extends the allocation by adding one copy to a node such that each time, the chosen

extension has the minimum communication cost. Finally, a data allocation is chosen such that its cost is minimized among the initial one and these extensions.

Algorithm 1. SIMPLE Input: {Qij, Uij : 1 5 j 5 n}, N, fi; Output: L is an allocation of fi; { ifUi>Othen

{ Vo := {j : Bij 1 0);

if Vo = 0 then cost(Vo) := co; V temp := v - v,;

for j = 1 to IV1 - IV01 do { choose a node 1 in Vtemp such that cost(Vj-1 U (1)) is minimized;

q := v,_l u (1); V tern* := V temp - (1); )

Choose a Vj such that cost(Vj) is minimized; L(fi) := Vj; }

else L(fi) = {j : Qij > 0). }

3.2. Performance Evaluation of the Algorithm SIMPLE

Clearly, this algorithm runs in polynomial time. Further, we have the following performance

guarantee.

THEOREM 2. For a fragment fi, suppose that L is a data allocation given by the algorithm

SIMPLE and Lopt is the data allocation with the minimum overall transmission cost to process

the given transactions under the simple strategy. Further, suppose that C is the largest value of the weights of the edges and c is the smallest value of the weights of the edges in the network.

Then cost(L(f,)) C

cost(L,,t(fi)) < C’

PROOF. Clearly, if Vi = 0 then the algorithm SIMPLE will output a data allocation with the minimum overall transmission cost 0. Below, we prove that this theorem is true for Vi > 0.

Suppose that Vo = {j : Bij 2 0) and L1 is the allocation of fi such that L1 (fi) = Lopt (fi) U VO.

From Lemma 1, it follows that cost(Ll(fi)) = cost(L,,t(fi)).

Let LO be an allocation of fi such that Lo(fi) = VO if VO # 8, otherwise Lo(fi) = {j} where j

is an arbitrary node in L,,,(fi). It is clear that Lo(fi) c Ll(fi).


From the algorithm SIMPLE, we have that cost(L(fi)) 5 cost(Lo(fi)). We now prove that

CONLO( c cost(Ll(fi)) s C’

fkvse that iLo(fi)l = Ko + 1 and ILl( = KI + 1. We have that

cost(Lo(fd) = c Q&, Lo(fd) + 2 Gj vmsp(j, Lo(fJ) dLo(f.) j=l

5 C QijC+ C QijC j6ch(fs) jELlUs)-Mfi)

+Ko C UijC+(Ko+l) C UijC

jGLo(f.1 j@oUi)

I C (Qij +u,)C+ C (Qij + Uij)C + KoUiC. j@L(fi) jEh(fd--LoUi)

From the fact that Lo(fi) contains all nodes j with Bj(fi) 2 0 and the above fact, it follows

that

cost(Lo(fi)) I C (Qij + Uij)C + KlUiC- jEL(fi)

Similarly,

cost(Ll(fi)) = C Qij%, b(fi)) + 2 vij vmsp(j, b(fi)) jG&Uifi) j=l

2 C Qijc+ K1 C uij c+(K~+l) C Uijc j&h (f*) jE-b(fi) jt%(f.)

2 C (Qij + Uij)c + KIU~C.

j@l(fg)

Hence, cost(Lo(fi)) C cost(Ll(fi)) 5 C’

I

COROLLARY 2. Suppose that the algorithm SIMPLE is applied to each fragment fi in F to obtain

the allocation Li of fi. Then, in the data allocation L of F such that L(fi) = Li(fi) for each fi,

we have that c:“=, cost(L(fi)) C

cz”=, cost(L,,t(fi)) 5 C’

where C is the largest value of the weights of the edges and c is the smallest value of the weights

of the edges in the given network.

From Corollary 2, it follows that:

COROLLARY 3. Suppose that the algorithm SIMPLE is applied to each fragment fi in F to obtain the allocat<on Li of fi in a uniform network where c = C. Then the data allocation L of F with

L(fi) = L4fi) f or each fi has the minimum overall transmission cost under the simple strategy

and MST-strategy.

Furthermore, suppose the given communication network N is not fully connected, each edge

has the same weight, and the diameter (31 of the underlying graph is d (for example, the graph

16 X.-M. LIN et al.

in Figure 2 has diameter 2). Then after application of the algorithm SIMPLE on the metric map

of N to get a data allocation L, we have:

Figure 2. Diameter 2.

Theoretically, the algorithm SIMPLE has a good performance for a local network-the network

such that C/c is small. Our experiments show that in practice, this algorithm is quite efficient and

quite effective (in most cases among random experiments it achieves the optimal) for a general

network.

3.3. A Heuristic Algorithm: REFINEMENT

As pointed out in Section 2, the recent developments of query processing optimization tech-

niques may always guarantee to output an execution plan better than the simple strategy. The

data allocation output by the algorithm SIMPLE needs to be refined because the strategies to

process queries are not necessarily simple. In this section, we present a framework of a refinement

algorithm, based on an employed distributed optimizer OP, on the data allocation Lo obtained by

the algorithm SIMPLE. Suppose that cost(L, T, OP) is the overall transmission cost to process T

by OP on the data allocation L.

A local modification of a data allocation L of F is either:

l for a fragment fi, drop a copy of fi from a node j with j E L(fi); or

l for a fragment fi, add a copy of fi to a node j with j $ (fi); or

l for a fragment fi, remove a copy of it from a node j with j E L(fi) to a node 1 with

l 6 Wi).

A data allocation L of F on N is locally optimal with respect to a distributed optimizer OP if

no local modification will reduce the overall communication cost to process T by OP.

The algorithm REFINEMENT iteratively refines Lo through the choice of a local modification,

so that the overall communication cost is greedily reduced, until there is no reduction.

Algorithm 2. REFINEMENT Input: F is a fragmentation, N is a network with node set V, OP is

a distributed optimizer, Lo is a data allocation of F on N;

Output: L is a data allocation of F on N; { L := Lo;

repeat co := cost(Lo, T, OP); c := co;

for each fragment fi do { for each j E Lo(fi) do

for each node k # j do { Locally modify LO to produce L1 so that Ll(fi) := Lo(fi) - {j} U {k}, and so that Ll(fi) := Lo(fi) for fr # fi; cl := cost(Ll,T,OP);


if cl < c then {L:=L1;c:=c1}};

for j E V - Lo(fi) do { Locally modify LO to produce L1 so that -&(fi) := Lo(fi) U {j}, and so that Ll(fi) := L&) cl := cost(ll, T, OF’); if cl < c then

for fi # k

{ L := L1; c := Cl } } }; Lo := L;

until c := Q (no reduction on co). }

It is clear that in the algorithm REFINEMENT, each iteration runs in polynomial time if

OP runs in polynomial time for each transaction. Each iteration never increases the overall

communication cost, and the algorithm will stop if there is no reduction in the overall transmission

cost. In practice, we can choose a fixed number as the maximal iteration times. Let Ti be a

subset of the given transaction set in which all transactions are required to access fragment fi.

To implement the algorithm REFINEMENT efficiently, for each local modification of fragment fi,

we only need to run the optimizer again for the transactions in Ti instead of the whole T.

From the algorithm REFINEMENT, the following property immediately holds:

PROPOSITION 1. The algorithm REFINEMENT produces a locally optimal data allocation of F.

3.4. Experiments and Remarks

Note that the algorithm REFINEMENT allows the employed optimizer to use any kind of

distributed query process algorithms. To test the effectiveness and efficiency of the algorithm

REFINEMENT, we have developed a prototype of a distributed process optimizer JK which

consists of MST-strategy and the algorithm in [9]. The distributed query process algorithm

in [9] is a join-based approach, and it may produce the optimal (minimum transmission cost)

execution plan for processing a chain query by the join-based approach. We have implemented

the algorithms SIMPLE and REFINEMENT by extensive experiments in which the product of

the node set of a network and the fragmentation size is from 18 to 100. In our environment, we

always assume that each given query in the given transaction set T should have a chain order

(i.e., the implementation of a query follows that of a chain query), like the assumption in [9]. The

experiments show that the combination of the algorithms SIMPLE and REFINEMENT works

very well in practice. We also observe that the algorithm REFINEMENT is slower, and in most

cases, outputs a worse locally optimal allocation if it starts from an arbitrary initial allocation.

We report our experiment results below.

First, we test the efficiency of the combination. The experiments show that the algorithm

REFINEMENT converges very fast by using the data allocation-utput by the algorithm

SIMPLE-as the initial allocation. We have not experienced more than 10 iterations.

Second, we test the effectiveness of the combination. Suppose that Lmin is the data allocation

with the minimum overall transmission cost under the restriction that each query is processed in

a chain order and is processed by the join-based approach [9]. LloC is the data allocation output

through the algorithm SIMPLE and the algorithm REFINEMENT. Our extensive experiments have the following statistical results.

l In the case that the product of node set size and the fragmentation size are not greater 18,

Cost(L,i,, T, JK)

cost(Lloc,T, JK) ’ 0Sg5*

l In the case that the product of node set size and the fragmentation size is from 18 to

100, for each experiment we randomly generate lo6 data allocations &ad. A random data

18 X.-M. LIN et al.

allocation Lrd has no more than 10e3 chance to be better than LI,,. Also,

COSt(Lrad, T, JK)

cost ( Lloc, T, JK) ’ o’g5’

Note that we may use the algorithm REFINEMENT to a data allocation under the simple

strategy. But our experiment showed that the algorithm SIMPLE is effective enough, that is,

in most cases the algorithm REFINEMENT may not refine the production of the algorithm

SIMPLE.

For future study, we shall try to extend the work in this paper to a network such that the

cost of each link is not a constant, but a function which depends on the network load and other

factors.

APPENDIX

In this appendix, we prove Theorem 3.. Because SDAP and SDAPOF are equivalent, we prove

the NP-hardness of SDAPOF in the following.

From [27], it follows that we need only to prove the NP-completeness of the following problem:

Allocation Problem (AP)

INSTANCE: Given a network N with n nodes, a fragment fi, two integers Uij and Qij for each

node j in N, and an integer K. QUESTION: Is there a data allocation L of fi such that (6) is not greater than K?

LEMMA 2. AP is W-complete.

PROOF. Note that the following problem has been shown NP-Complete in [27].

Steiner Tree Problem (STP)

INSTANCE: Given a metric network N = (V, E,p), a subset v c V, and an integer 1.

QUESTION: Is there a subtree (VI, El) of N so that v c VI and xeEEl p(e) 5 l?

For each instance 11 = (N, v’, I) of STP, where N = (V, E, p) is a graph and V = {j : 1 5 j 5 n}

and v c V, we now construct an instance 12 = (N, {U,, Qij : 1 2 j 5 n}, K) of AP a~ follows,

where fl= (P, E,fi) is a network:

l jiT=N;

ofor 1 5 j 5 n, Uij = c, and Qij = nc, where c is a constant integer if j E v, and both U,

and Qij are zero if j 4 v’;

OK = cllv’l.

It can be immediately shown that for a solution (VI, El) of STP, we may construct an allo-

cation L, where L(fi) = VI, which is a solution of

follows that for a solution L of AP, the minimum

Hence the lemma holds.

Thus, Theorem 1 holds.

AP. Also from Lemma 1 and Corollary 1, it

spanning tree of L(fi) is a solution of STP.

I

REFERENCES 0. Wolfson and A. Mile, The multicast policy and its relationship to replicated data placement, ACM

Transactions on Database System 16 (l), 181-205 (1991). Y. Y. Mansour and B. Patt-Shamir, Greedy packet scheduling on shortest paths, Proceedings of the 2ph ACM Symposium on Principles of Distributed Computing, pp. 165-176, (1991). J.A. Bondy and U.S.R. Murty, Gnrph Theory with Applications, The Macmillan, (1978).


4.

5.

6.

7.

8.

9.

10.

11.

12.

13. 14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28. 29. 30.

31.

32.

33.

34.

35.

P.M.G. Apers, A. Hevner and S.B. Yao, Optimization algorithms for distributed queries, IEEE ‘lkansactions on Software Engineering SE-9 (l), 57-68 (1983). P.A. Bernstein and D. Chiu, Using semi-joins to solve relational queries, Journal of ACM 28 (I), 25-40

(1981). P.A. Bernstein, N. Goodman, E. Wong, C.L. Reeve and J.B. Rothe, Query processing in a system for distributed database (SDD-l), ACM Transaction on Database Systems 6 (4), 602-625 (1981). M.-S. Chen and P.S. Yu, Interleaving a join sequence with semijoins in distributed query processing, IEEE

Transactions on Parallel and Distributed Systems 3 (5), 611-621 (1992). A.R. Hevner and S.B. Yao, Query processing in distributed database systems, IEEE l?unsactions on Software

Engineering SE5 (3), 177-187 (1979). M.W. Orlowski, On optimisation of joins in distributed database system, Future Database 9.2, pp. 106-114, World Scientific, (1992). S. Pramanik and D. Vineyard, Optimizing join queries in distributed databases, IEEE tinsactions on Software Engineering 14 (9), 1319-1326 (1988). C.P. Wang, V.O.K. Li and A.L.P. Chen, One-shot semi-join execution strategies for processing distributed queries, ph IEEE Data Engineering Conference, pp. 756-763, (1991). E. Wong, Dynamic rematerialization: Processing distributed queries using redundant data, IEEE tinsactions on Software Engineering SE-9 (3), 228-232 (1983). C.T. Yu and C.C. Chang, Distributed query processing, ACM Computing Surveys 16 (4) (1984). C.T. Yu, Z.M. Ozsoyoglu and K. Lam, Optimization of distributed tree queries, Journal of Computer and System Science 29, 399-433 (1984). X. Lin, M. Orlowska and Y. Zhang, On data allocation with the minimum overall communication cost in distributed database design, International Conference on Information and Computing 93, IEEE Computer

Press. Replicated fragment allocation using a clustering technique, Proceedings of 4 th Australian Database Conjer-

ence, (1993). P.M.G. Apers, Data allocation in distributed database systems, ACM Bunsactions on Databases Systems

13 (3), 263-304. R.G. Casey, Allocation of copies of a file in information network, Proceedings of the 1972 Spring Joint Computer Conference, AFZPS, pp. 617-625, (1972). L.W. Dowdy and D.V. Foster, Comparative models of the file assignment problem, ACM Computing Surveys 14 (2), 287-313 (1982). H.L. Morgan and K.D. Levin, Optimal program and data location in computer network, Communications of

ACM 20 (5), 315-322 (1977). D. Sacca and G. Wiederhold, Database partitioning in a cluster of processors, ACM 7+-ansactions on Database

Systems 10 (l), 28-56 (1985). S. Navathe et al., Vertical partitioning algorithms for database design, ACM Transactions on Database

Systems 9 (4), 68&303 (1984). S. Navathe and M. Ra, Vertical partitioning for database design: A graphical algorithm, ACM SZGMOD,

44&450 ( 1989). X. Lin, M. Orlowska and Y. Zhang, A graph based cluster approach for vertical partitioning in database design, Data and Knowledge Engineering (1993) (to appear). Y. Zhang, M. Orlowska and B. Colomb, An efficient test for the validity of hybrid knowledge fragmentation in distributed databases, International Journal of Software Engineering and Knowledge Engineering 2 (4),

589-609 ( 1992). Y. Zhang, On horizontal fragmentation of distributed database design, Proceedings of Australian Database

Conference, (1993). M.R. Garey and D.S. Johnson, Computer and Intractability-A Guide to the Theory of NP-Completeness, W.H. Freeman and Company, (1978). D. Maier, Theory of Relational Databases, Computer Science Press, (1983). C.J. Date, An Introduction to Database System, Addition-Wesley, (1982). C.P. Wang, The complexity of processing tree queries in distributed databases, Fd IEEE Symposium on

Parallel and Distributed Processing, pp. 604-611, (1990). D. Chiu, P.A. Bernstein and Y. Ho, Optimizing chain queries in a distributed database system, SIAM Journal

on Computing 13 (l), 116-134 (1984). M.-S. Chen and P.S. Yu, Using join operations as reducers in distributed query processing, Proceedings of the Z”d Intern?. Symposium on Databases in Parallel and Distributed Systems, pp. 116-123, (1990). M.-S. Chen and P.S. Yu, Using combination of join and semijoins operations for distributed query processing, Proceedings of the .Z”d Intern? Symposium on Databases in Parallel and Distributed Systems, pp. 328-335, (1990). M.-S. Chen and P.S. Yu, Determining beneficial semijoins for a join sequence in distributed query processing, IEEE International Conference on Data Engineering, pp. 50-58, (1991). S. Ganguly, W. Hasan and R. Krishnamurthy, Query optimization for parallel execution, SZGMOD Record 21 (2), 9-18 (1992).

36. R. Sedgewick, Algorithms, Addision-Wesley, (1988).

Men 19:1-c

542d23c60cf27e39fa941a5d

Documents

communication network

communication networks

replicated data

link zi

path p

keywordscommunication

twoway communication

data volume n