8/17/2019 DDB Presentation1.2Data Fragmentation
1/67
Data Fragmentation
In a DDB, decisions must be made regardingwhich site should be used to store which
portions of the database.
For now, we will assume that there is no
replication; that is, each relation – or portion of
a relation – is stored at one site only
8/17/2019 DDB Presentation1.2Data Fragmentation
2/67
One possible database state for the CO!"#$
relational database schema
8/17/2019 DDB Presentation1.2Data Fragmentation
3/67
Data %eplication "nd "llocation
If a fragment is stored at more than onesite, it is said to be replicated.
%eplication is useful in impro&ing thea&ailability of data. 'he most e(treme caseis replication of the whole database ate&ery site in the distributed system, thus
creating a fully replicated distributeddatabase.
8/17/2019 DDB Presentation1.2Data Fragmentation
4/67
Data %eplication "nd "llocation
'his can impro&e a&ailability remar)ablybecause the system can
continue to operate as long as at least one siteis up.
It also impro&es performance of retrie&al ofglobal *ueries because the results of such
*ueries can be obtained locally from any onesite; hence, a retrie&al *uery can be processedat the local site where it is submitted, if that siteincludes a ser&er module.
8/17/2019 DDB Presentation1.2Data Fragmentation
5/67
Data %eplication "nd "llocation
'he disad&antage of full replication is that it canslow down update operations drastically, since asingle logical update must be performed one&ery copy of the database to )eep the copiesconsistent. 'his is especially true if many copiesof the database e(ist.
Full replication ma)es the concurrency controland reco&ery techni*ues more e(pensi&e thanthey would be if there was no replication
8/17/2019 DDB Presentation1.2Data Fragmentation
6/67
Data %eplication "nd "llocation
'he other e(treme from full replication in&ol&es ha&ingno replication – that is, each fragment is stored at e(actlyone site. In this case, all fragments must be dis+oint,e(cept for the repetition of primary )eys among &erticalor mi(ed- fragments. 'his also called nonredundant
allocation.
Between these two e(tremes, we ha&e a wide spectrumof partial replication of the data – that is, some fragmentsof the database may be replicated whereas others maynot. 'he number of copies of each fragment can rangefrom one up to the total number of sites in the distributedsystem.
8/17/2019 DDB Presentation1.2Data Fragmentation
7/67
!artial %eplication /(amples
" special case of partial replication is occurringhea&ily in applications where mobile wor)ers –such as sales forces, financial planners, andclaims ad+ustors – carry partial replicated
databases with them on laptops and !D"s andsynchronise them periodically with the ser&erdatabase.
" description of the replication of fragments is
sometimes called a replication schema.
8/17/2019 DDB Presentation1.2Data Fragmentation
8/67
Data %eplication "nd "llocation
/ach fragment or each copy of a fragment must be
assigned to a particular site in the distributed system. 'his
process is called data distribution or data allocation -.
'he choice of sites and the degree of replication depend on
the performance and a&ailability goals of the system and on
the types and fre*uencies of transactions submitted at
each site.
8/17/2019 DDB Presentation1.2Data Fragmentation
9/67
Data %eplication "nd "llocation
For e(ample, if high a&ailability is re*uired
and transactions can be submitted at any
site and if most transactions are retrie&alonly, a fully replicated database is a good
choice.
8/17/2019 DDB Presentation1.2Data Fragmentation
10/67
Data %eplication "nd "llocation
0owe&er, if certain transactions that access
particular parts of the database are mostly
submitted at a particular site, the corresponding
set of fragments can be allocated at that site
only.
Data that is accessed at multiple sites can be
replicated at those sites.
If many updates are performed, it may be useful to
limit replication.
8/17/2019 DDB Presentation1.2Data Fragmentation
11/67
/(ample of Fragmentation, "llocation,
and %eplication
8/17/2019 DDB Presentation1.2Data Fragmentation
12/67
'ypes of Distributed Database
1ystems.
'he main thing that all Distributed Database
systems ha&e in common is the fact thatdata and software are distributed o&er
multiple sites connected by some form of
communication networ).
8/17/2019 DDB Presentation1.2Data Fragmentation
13/67
Criteria and Factors that Distinguish
DDB1s
'he first factor we consider is the degree of
homogeneity of the DDB1 software. If all
ser&ers or indi&idual local DDB1s- useidentical software and all users clients- use
identical software, the DDB1 is called
homogeneous; otherwise, it is calledheterogeneous.
8/17/2019 DDB Presentation1.2Data Fragmentation
14/67
Criteria and Factors that Distinguish
DDB1s "nother factor related to the degree of homogeneity
is the degree of local autonomy. If there is no
pro&ision for the local site to function as a standalone
DB1, then the system has no local autonomy.
On the other hand, if direct access by local
transactions to a ser&er is permitted, the system
has some degree of local autonomy.
8/17/2019 DDB Presentation1.2Data Fragmentation
15/67
Criteria and Factors that Distinguish
DDB1s
"t one e(treme of the autonomy spectrum,
we ha&e a DDB1 that loo)s li)e a
centrali2ed DB1 to the user.
" single conceptual schema e(ists, and all
access to the system is obtained through asite that is part of the DDB13which
means that no local autonomy e(ists.
8/17/2019 DDB Presentation1.2Data Fragmentation
16/67
Criteria and Factors that Distinguish
DDB1s
"t one e(treme of the autonomy spectrum,
we ha&e a DDB1 that loo)s li)e a
centrali2ed DB1 to the user.
" single conceptual schema e(ists, and all
access to the system is obtained through a site that is part of the DDB13which
means that no local autonomy e(ists.
8/17/2019 DDB Presentation1.2Data Fragmentation
17/67
Criteria and Factors that Distinguish
DDB1s "t the other e(treme we encounter a type of DDB1 called
a federated. DDB1 or multidatabase 1ystem-. In such a
system, each ser&er is an independent and autonomous
centrali2ed DB1 that has its own local users, local
transactions, and DB" and hence has a &ery high degreeof local autonomy.
'he term federated database system FDB1- is used when
there is some global &iew or schema of the federation of databases that is shared by the applications. On the other
hand, a multidatabase system does not ha&e a global
schema and interacti&ely constructs one as needed by the
application.
8/17/2019 DDB Presentation1.2Data Fragmentation
18/67
Criteria and Factors that Distinguish
DDB1s
In a heterogeneous FDB1, one ser&er may
be a relational DB1, another is a networ)
DB1, and a third an ob+ect or hierarchical
DB1; in such a case it is necessary to
ha&e a canonical system language and to
include language translators to translatesub*ueries from the canonical language to
the language of each ser&er.
8/17/2019 DDB Presentation1.2Data Fragmentation
19/67
Federated Database anagement 1ystems
Issues – Differences in data models Databases in an organi2ation come from a &ariety of data
models including the socalled legacy models networ)
and hierarchical- the relational data model, the ob+ect data
model, and e&en files.
'he modelling capabilities of the models &ary. 0ence, to deal with them uniformly &ia a single global schema or to
process them in a single language is challenging.
/&en if two databases are both from the %DB1
en&ironment, the same information may be represented asan attribute name, as a relation name, or as a &alue in
different databases. 'his calls for an intelligent *uery
processing mechanism that can relate information based on
metadata.
8/17/2019 DDB Presentation1.2Data Fragmentation
20/67
Federated Database anagement 1ystems
Issues – Differences in constraints.
Constraint facilities for specification and implementation&ary from system to system. 'here are comparable featuresthat must be reconciled in the construction or a globalschema.
For e(ample, the relationships from /% models are
represented as referential integrity constraints in the
relational model. 'riggers may ha&e to be used to
implement certain constraints in the relational model.
'he global schema must also deal with potential conflicts
among constraint.
8/17/2019 DDB Presentation1.2Data Fragmentation
21/67
Federated Database anagement 1ystems
Issues Differences in query languages
/&en with the same data model, the languages
and their &ersions &ary.
For e(ample, 145 has multiple &ersions li)e
14567, 14578, and 14577, and each
system has its own set of data types,
comparison operators, string manipulation
features, and so on.
8/17/2019 DDB Presentation1.2Data Fragmentation
22/67
Federated Database anagement 1ystems
Issues 1emantic 0eterogeneity.
1emantic heterogeneity occurs when there are differences
in the meaning, interpretation, and intended use of the
same or related data.
1emantic heterogeneity among component database
systems DB1s- creates the biggest hurdle in designing
global schemas of heterogeneous databases.
'he design autonomy of component DB1s refers to their
freedom of choosing the following design parameters,
which in turn affect the e&entual comple(ity of the FDB1
8/17/2019 DDB Presentation1.2Data Fragmentation
23/67
4uery !rocessing in Distributed Databases
Data 'ransfer Costs of Distributed 4uery
!rocessing
'he first is the cost of transferring data o&er
the networ). 'his data includes intermediate
files that are transferred to other sites for
further processing, as well as the final result
files that may ha&e to be transferred to thesite where the *uery result is needed.
8/17/2019 DDB Presentation1.2Data Fragmentation
24/67
4uery !rocessing in Distributed Databases
Data 'ransfer Costs of Distributed 4uery
!rocessing
"lthough these costs may not be &ery high if the
sites are connected &ia a highperformance local
area networ), they become *uite significant inother types of networ)s.
0ence, DDB1 *uery optimi2ation algorithms
consider the goal of reducing the amount of datatransfer as an optimi2ation criterion in choosing a
distributed *uery e(ecution strategy.
8/17/2019 DDB Presentation1.2Data Fragmentation
25/67
Data 'ransfer Costs of Distributed 4uery
!rocessing /(ample
4uery 49
For each employee, retrie&e the employee nameand the name of the department for which theemployee wor)s.
'his can be stated as follows in the relational
algebra9
49 : Fname,5name,Dname/!5O$//
DnoDnumber D/!"%'/#'-
8/17/2019 DDB Presentation1.2Data Fragmentation
26/67
Data 'ransfer Costs of Distributed 4uery
!rocessing Site 1: EMPLOYEE
8/17/2019 DDB Presentation1.2Data Fragmentation
27/67
Data 'ransfer Costs of Distributed 4uery
!rocessing – 1ite 8 D/!"%'/#'
bytes longDnumber field is ? bytes long Dname is
8/17/2019 DDB Presentation1.2Data Fragmentation
28/67
Data 'ransfer Costs of Distributed 4uery
!rocessing /(ample
'he result of this *uery 4 include
8/17/2019 DDB Presentation1.2Data Fragmentation
29/67
'hree simple strategies for e(ecuting
distributed *uery 49
8/17/2019 DDB Presentation1.2Data Fragmentation
30/67
'hree simple strategies for e(ecuting
distributed *uery 49 contd-@. 'ransfer the D/!"%'/#' relation tosite == bytes must be transferred.
If minimi2ing the amount of data transfer is
our optimi2ation criterion, we should
choose strategy @.
8/17/2019 DDB Presentation1.2Data Fragmentation
31/67
Data 'ransfer Costs of Distributed 4uery
!rocessing /(ample
4uery 4E9
For each department, retrie&e the department
name and the name of the department manager.
"gain, suppose that the *uery is submitted atsite @. 'he same three strategies for e(ecuting
*uery 4 apply to 4E, e(cept that the result of 4Eincludes only
8/17/2019 DDB Presentation1.2Data Fragmentation
32/67
/(ecuting 4uery 4
8/17/2019 DDB Presentation1.2Data Fragmentation
33/67
/(ecuting 4uery 4
@. 'ransfer the D/!"%'/#' relation to site == bytesmust be transferred.
"gain, we would choose strategy @3in this caseby an o&erwhelming margin o&er strategies < and
8. 'he preceding three strategies are the mostob&ious ones for the case where the result sitesite @- is different from all the sites that containfiles in&ol&ed in the *uery sites < and 8-.
8/17/2019 DDB Presentation1.2Data Fragmentation
34/67
4uery !rocessing in Distributed Databases
0owe&er, suppose that the result site is site 8; then weha&e two simple strategies9
8/17/2019 DDB Presentation1.2Data Fragmentation
35/67
Distributed 4uery !rocessing Hsing 1emi+oin
'he idea behind distributed *uery processing
using the semi+oin operation is to reduce the
number of tuples in a relation before
transferring it to another site
8/17/2019 DDB Presentation1.2Data Fragmentation
36/67
1emi+oin !rocedure
First, send the +oining column of one relation %
to the site where the other relation 1 is located;
'he +oining column of % is then +oined with 1.
Following that, the +oin attributes, along with the
attributes re*uired in the result, are pro+ected outand shipped bac) to the original site and +oined
with %.
8/17/2019 DDB Presentation1.2Data Fragmentation
37/67
1emi+oin !rocedure
0ence, only the +oining column of % is
transferred in one direction, and a subset
of 1 with no e(traneous tuples or
attributes is transferred in the other
direction.
8/17/2019 DDB Presentation1.2Data Fragmentation
38/67
Hsing 1emi+oin to /(ecute 4uery 4 and 4E
Ob+ecti&e is to reduce the number of tuples in a relation
before transferring it to another site. /(ample e(ecution of 4 or 49
8/17/2019 DDB Presentation1.2Data Fragmentation
39/67
Hsing 1emi+oin to /(ecute 4uery 4 and 4E
For *uery 4, this turned out to include all
/!5O$// tuples, so little impro&ementwas achie&ed. 0owe&er, for 4E only
8/17/2019 DDB Presentation1.2Data Fragmentation
40/67
4uery and Hpdate DecompositionIn a DDB1 with no distribution transparencyE,
the user phrases a *uery directly in terms of specific fragments.
For e(ample, consider the *uery 49 %etrie&e thenames and hours per wee) for each employee who
wor)s on some pro+ect controlled by department >,
which is specified on the distributed database wherethe relations at sites 8 and @ are shown in Figure >,
and those at site < are shown in Figure ?
8/17/2019 DDB Presentation1.2Data Fragmentation
41/67
4uery and Hpdate Decomposition
" user who submits such a *uery must specify whether
it references the !%O1A> and JO%K1AO#A>
relations at site 8 Figure > or the !%O/C' and
JO%K1AO# relations at site < Figure ?.
'he user must also maintain consistency of
replicated data items when updating a DDB1 with no
replication transparency.
4 d H d t D iti
8/17/2019 DDB Presentation1.2Data Fragmentation
42/67
4uery and Hpdate Decomposition
On the other hand, a DDB1 that supports
full distribution, fragmentation, and replicationtransparency allows the user to specify a *uery or
update re*uest on the schema of Figure 8 +ust as
though the DB1 were centrali2ed.
For updates, the DDB1 is responsible for
maintaining consistency among replicated items byusing one of the distributed concurrency control
algorithms
8/17/2019 DDB Presentation1.2Data Fragmentation
43/67
4uery and Hpdate Decomposition
For *ueries, a *uery decomposition
module must brea) up or decompose a
*uery into sub*ueries that can be
e(ecuted at the indi&idual sites. "dditionally, a strategy for combining the
results of the sub*ueries to form the *uery
result must be generated.
8/17/2019 DDB Presentation1.2Data Fragmentation
44/67
4uery and Hpdate Decomposition
Jhene&er the DDB1 determines that an
item referenced in the *uery is replicated, it
must choose or materiali2e a particular replica during *uery e(ecution.
8/17/2019 DDB Presentation1.2Data Fragmentation
45/67
4uery and Hpdate Decomposition
'o determine which replicas include the data
items referenced in a *uery, the DDB1
refers to the fragmentation, replication,
and distribution information stored in theDDB1 catalog.
For &ertical fragmentation, the attribute list
for each fragment is )ept in the catalog.
8/17/2019 DDB Presentation1.2Data Fragmentation
46/67
4uery and Hpdate Decomposition
For hori2ontal fragmentation, a condition,some times called a guard, is )ept foreach fragment.
'his is basically a selection condition thatspecifies which tuples e(ist in the
fragment; it is called a guard because onlytuples that satisfy this condition arepermitted to be stored in the fragment.
8/17/2019 DDB Presentation1.2Data Fragmentation
47/67
4uery and Hpdate Decomposition
For mi(ed fragments, both the attribute list
and the guard condition are )ept in the
catalog.
8/17/2019 DDB Presentation1.2Data Fragmentation
48/67
4uery and Hpdate Decomposition
In our earlier e(ample, the guard conditions
for fragments at site < Figure ? are '%H/
all tuples-, and the attribute lists are all
attributes-. For the fragments shown inFigure >, we ha&e the guard conditions
and attribute lists shown in Figure 7.
8/17/2019 DDB Presentation1.2Data Fragmentation
49/67
4uery and Hpdate Decomposition
Jhen the DDB1 decomposes an update
re*uest, it can determine which fragments must be
updated by e(amining their guard conditions.
For e(ample, a user re*uest to insert a new
/!5O$// tuple LE"le(E, EBE, EColemanE,
E@?>MG
8/17/2019 DDB Presentation1.2Data Fragmentation
50/67
4uery and Hpdate Decomposition
would be decomposed by the DDB1 into
two insert re*uests9 the first inserts the
preceding tuple in the /!5O$// fragment
at site
8/17/2019 DDB Presentation1.2Data Fragmentation
51/67
Puard Conditions and "ttribute listEMPD5
attribute fist9 Fname. init, 5name, 1sn, 1alary,1uperAssn, Dno guard condition9 Dno> D/!>
attribute list9 all attributes Dname, Dnumber, grAssn,grAstartAdate-
guard condition9 Dnumber> D/!>.OC1
attribute list9 all attributes Dnumber, 5ocation-
guard condition9 Dnumben> !%O1>
attribute list9 all attributes !name, !number, !location,Dnum-
guard condition9 Dnum> JO%K1AO#>
attnbute list9 all attributes /ssn, !no,0ours-
guard condition9 /ssn I# :1sn/!D>-- O% !no I#:pnumber !%O1>--
P d C diti d "tt ib t li t
8/17/2019 DDB Presentation1.2Data Fragmentation
52/67
Puard Conditions and "ttribute listEMPD4
attnbute list9 Fname, init, 5name, 1sn, 1alary, 1uperAssn,Dno
guard condition9 Dno? D/!?
attribute list9 all attributes Dname, Dnumber, grAssn,grAstartAdate-
guard condition9 Dnumber? D/!?A5OC1attribute list9 ail attributes Dnumber, 5ocation- guardcondition9 Dnumber? !%O1?
attribute list9 all attributes !name, !number, !location,Dnum-
guard condition9 Dnum? JO%K1..O#?attribute list9 all attributes /ssn, !no, 0ours-
guard condition9 /ssn I# :1sn/!D?--
O% !no I# : pnumber!%O1?--
8/17/2019 DDB Presentation1.2Data Fragmentation
53/67
4uery and Hpdate Decomposition
For *uery decomposition, the DDB1 can
determine which fragments may contain the
re*uired tuples by comparing the *uerycondition with the guard conditions.
8/17/2019 DDB Presentation1.2Data Fragmentation
54/67
concurrency control and reco&ery
Distributed Databases encounter anumber of concurrency control andreco&ery problems which are not present
in centrali2ed databases. 1ome of themare listed below. Dealing with multiple copies of data items
Failure of indi&idual sites
Communication lin) failure
Distributed commit
Concurrency Control and %eco&ery
8/17/2019 DDB Presentation1.2Data Fragmentation
55/67
Concurrency Control and %eco&ery
in Distributed Databases
Dealing with multiple copies of the data items:
'he concurrency control method is responsible
for maintaining consistency among thesecopies of data.
'he reco&ery method is responsible for ma)ing a
copy consistent with other copies if the site on which
the copy is stored fails and reco&ers later.
C C t l d %
8/17/2019 DDB Presentation1.2Data Fragmentation
56/67
Concurrency Control and %eco&ery
in Distributed Databases
Failure of individual sites:
'he DDB1 should continue to operate with its
running sites, if possible, when one or moreindi&idual sites fail.
Jhen a site reco&ers, its local database must be
brought uptodate with the rest of the sites before
it re+oins the system.
C C t l d %
8/17/2019 DDB Presentation1.2Data Fragmentation
57/67
Concurrency Control and %eco&ery
in Distributed Databases
Failure of communication links:
'he system must be able to deal with failure of
one or more of the communication lin)s that
connect the sites.
"n e(treme case of this problem is that networ)
partitioning may occur. 'his brea)s up the sites
into two or more partitions, where the sites withineach partition can communicate only with one
another and not with sites in other partitions.
Conc rrenc Control and %eco er
8/17/2019 DDB Presentation1.2Data Fragmentation
58/67
Concurrency Control and %eco&ery
in Distributed Databases
Distributed commit:
!roblems can arise with committing a
transaction that is accessing databasesstored on multiple sites if some sites failduring the commit process.
'his re*uire a two or threephase commitapproach for transaction commit.
Concurrency Control and %eco&ery
8/17/2019 DDB Presentation1.2Data Fragmentation
59/67
Concurrency Control and %eco&ery
in Distributed Databases
Distributed deadloc)9
1ince transactions are processed at multiple
sites, two or more sites may get in&ol&ed in
deadloc). 'his must be resol&ed in a
distributed manner.
' h i f 1 l i d
8/17/2019 DDB Presentation1.2Data Fragmentation
60/67
'echni*ues for 1ol&ing reco&ery and
concurrency control issues in DDB1s
'he method used is to designate a
particular copy of each data item as a
distinguished copy.
'he loc)s for this data item are associated
with the distinguished copy, and all loc)ing
and unloc)ing re*uests are sent to the sitethat contains that copy.
8/17/2019 DDB Presentation1.2Data Fragmentation
61/67
Concurrency Control "nd %eco&ery
Distributed Concurrency control based on a
distributed copy of a data item
– !rimary site techni*ue9 " single site is designated as a
primary site which ser&es as a coordinator for
transaction management.
8/17/2019 DDB Presentation1.2Data Fragmentation
62/67
Primary Site Technique
'ransaction management9
Concurrency control and commit are
managed by this site.
In two phase loc)ing, this site manages
loc)ing and releasing data items. If all
transactions follow twophase policy at allsites, then seriali2ability is guaranteed.
Concurrency Control "nd %eco&ery
8/17/2019 DDB Presentation1.2Data Fragmentation
63/67
Concurrency Control "nd %eco&ery
'ransaction anagement
– "d&antages9 "n e(tension to the centrali2ed two phase loc)ing so implementation
and management is simple.
Data items are loc)ed only at one site but they can be accessed at
any site.
– Disad&antages9
"ll transaction management acti&ities go to primary site which is
li)ely to o&erload the site.
If the primary site fails, the entire system is inaccessible.
– 'o aid reco&ery a bac)up site is designated which beha&esas a shadow of primary site. In case of primary site failure,
bac)up site can act as primary site.
8/17/2019 DDB Presentation1.2Data Fragmentation
64/67
Concurrency Control "nd %eco&ery
!rimary Copy 'echni*ue9 – In this approach, instead of a site, a data item partition
is designated as primary copy. 'o loc) a data item +ust the primary copy of the data item is loc)ed.
"d&antages9 – 1ince primary copies are distributed at &arious sites, asingle site is not o&erloaded with loc)ing andunloc)ing re*uests.
Disad&antages9
– Identification of a primary copy is comple(. "distributed directory must be maintained, possibly atall sites.
Concurrency Control "nd %eco&ery
8/17/2019 DDB Presentation1.2Data Fragmentation
65/67
Concurrency Control "nd %eco&ery
%eco&ery from a coordinator failure – In both approaches a coordinator site or copy may
become una&ailable. 'his will re*uire the selection ofa new coordinator.
!rimary site approach with no bac)up site9 – "borts and restarts all acti&e transactions at all sites.
/lects a new coordinator and initiates transactionprocessing.
!rimary site approach with bac)up site9 – 1uspends all acti&e transactions, designates the
bac)up site as the primary site and identifies a newbac) up site. !rimary site recei&es all transactionmanagement information to resume processing.
!rimary and bac)up sites fail or no bac)up site9 – Hse election process to select a new coordinator site.
8/17/2019 DDB Presentation1.2Data Fragmentation
66/67
Concurrency Control and %eco&ery
Concurrency control based on &oting9
'here is no primary copy or coordinator.
1end loc) re*uest to sites that ha&e data item.
If ma+ority of sites grant loc) then the re*uestingtransaction gets the data item.
5oc)ing information grant or denied- is sent to all
these sites.
'o a&oid unacceptably long wait, a timeout period isdefined. If the re*uesting transaction does not get
any &ote information then the transaction is aborted.
8/17/2019 DDB Presentation1.2Data Fragmentation
67/67
/nd