Page 1
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1
Outline• Introduction
• Background
• Distributed Database Design
• Database Integration
• Semantic Data Control
• Distributed Query Processing
➡ Overview
➡ Query decomposition and localization
➡ Distributed query optimization
• Multidatabase query processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Page 2
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/2
Query Processing in a DDBMS
high level user query
queryprocessor
Low-level data manipulation commands for D-DBMS
Page 3
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/3
Distributed Query Processing Methodology
Calculus Query on Distributed Relations
CONTROLSITE
LOCALSITES
QueryDecomposition
DataLocalization
Algebraic Query on DistributedRelations
GlobalOptimization
Fragment Query
LocalOptimization
Optimized Fragment Querywith Communication Operations
Optimized Local Queries
GLOBALSCHEMA
FRAGMENTSCHEMA
STATS ONFRAGMENTS
LOCALSCHEMAS
Page 4
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/4
Step 1 – Query DecompositionInput : Calculus query on global relations
•Normalization➡ manipulate query quantifiers and qualification
•Analysis➡ detect and reject “incorrect” queries➡ possible for only a subset of relational calculus
•Simplification➡ eliminate redundant predicates
•Restructuring➡ calculus query algebraic query➡ more than one translation is possible➡ use transformation rules
Page 5
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/5
Normalization
• Lexical and syntactic analysis
➡ check validity (similar to compilers)
➡ check for attributes and relations
➡ type checking on the qualification
• Put into normal form
➡ Conjunctive normal form
(p11 p12 … p1n) … (pm1 pm2 … pmn)
➡ Disjunctive normal form
(p11 p12 … p1n) … (pm1 pm2 … pmn)
➡ OR's mapped into union
➡ AND's mapped into join or selection
Page 6
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/6
Analysis
•Refute incorrect queries
•Type incorrect➡ If any of its attribute or relation names are not defined in the global
schema➡ If operations are applied to attributes of the wrong type
•Semantically incorrect➡ Components do not contribute in any way to the generation of the
result➡ Only a subset of relational calculus queries can be tested for
correctness✦ Those that do not contain disjunction and negation
➡ To detect✦ connection graph (query graph)✦ join graph
Page 7
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/7
Analysis – ExampleSELECT ENAME,RESPFROM EMP, ASG, PROJWHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND PNAME = "CAD/CAM"AND DUR ≥ 36AND TITLE = "Programmer"
Query graph Join graphDUR≥36
PNAME=“CAD/CAM”
ENAME
EMP.ENO=ASG.ENO ASG.PNO=PROJ.PNO
RESULT
TITLE =“Programmer” RESP
ASG.PNO=PROJ.PNOEMP.ENO=ASG.ENOASG
PROJEMP EMP PROJ
ASG
Page 8
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/8
Analysis
If the query graph is not connected, the query may be wrong or use Cartesian productSELECT ENAME,RESPFROM EMP, ASG, PROJWHERE EMP.ENO = ASG.ENO AND PNAME = "CAD/CAM" AND DUR > 36AND TITLE = "Programmer"
PNAME=“CAD/CAM”
ENAMERESULT
RESP
ASG
PROJEMP
Page 9
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/9
Simplification
• Why simplify?
➡ Remember the example
• How? Use transformation rules
➡ Elimination of redundancy
✦ idempotency rulesp1 ¬( p1) false
p1 (p1∨ p2) p1
p1 false p1
…
➡ Application of transitivity
➡ Use of integrity rules
Page 10
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/10
Simplification – Example
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = "J. Doe"
OR (NOT(EMP.TITLE = "Programmer")
AND (EMP.TITLE = "Programmer"
OR EMP.TITLE = "Elect. Eng.")
AND NOT(EMP.TITLE = "Elect. Eng."))
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = "J. Doe"
Page 11
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/11
Restructuring
•Convert relational calculus to relational algebra
•Make use of query trees
•ExampleFind the names of employees other than J. Doe who worked on the CAD/CAM project for either 1 or 2 years.SELECT ENAMEFROM EMP, ASG, PROJWHERE EMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO AND ENAME≠ "J. Doe"AND PNAME = "CAD/CAM" AND (DUR = 12 OR DUR = 24)
ENAME
σDUR=12 OR DUR=24
σPNAME=“CAD/CAM”
σENAME≠“J. DOE”
PROJ ASG EMP
Project
Select
Join
⋈PNO
⋈ENO
Page 12
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/12
Restructuring –Transformation Rules• Commutativity of binary operations
➡ R × S S × R
➡ R ⋈S S ⋈R
➡ R S S R
• Associativity of binary operations➡ ( R × S) × T R × (S × T)
➡ (R ⋈S) ⋈T R ⋈ (S ⋈T)
• Idempotence of unary operations➡ A’( A’(R)) A’(R)
➡ p1(A1)(p2(A2)(R)) p1(A1)p2(A2)(R)
where R[A] and A' A, A" A and A' A"
• Commuting selection with projection
Page 13
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/13
Restructuring – Transformation Rules• Commuting selection with binary operations
➡ p(A)(R × S) (p(A) (R)) × S
➡ p(Ai)(R ⋈(Aj,Bk)S) (p(Ai)
(R)) ⋈(Aj,Bk)S
➡ p(Ai)(R T) p(Ai)
(R) p(Ai) (T)
where Ai belongs to R and T
• Commuting projection with binary operations
➡ C(R × S) A’(R) × B’(S)
➡ C(R ⋈(Aj,Bk)S) A’(R) ⋈(Aj,Bk) B’(S)
➡ C(R S) C(R) C(S)
where R[A] and S[B]; C = A' B' where A' A, B' B
Page 14
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/14
Example
Recall the previous example:
Find the names of employees other than J. Doe who worked on the CAD/CAM project for either one or two years.
SELECT ENAME
FROM PROJ, ASG, EMP
WHERE ASG.ENO=EMP.ENO
AND ASG.PNO=PROJ.PNO
AND ENAME ≠ "J. Doe"
AND PROJ.PNAME="CAD/CAM"
AND (DUR=12 OR DUR=24)
ENAME
DUR=12 DUR=24
PNAME=“CAD/CAM”
ENAME≠“J. DOE”
PROJ ASG EMP
Project
Select
Join
⋈PNO
⋈ENO
Page 15
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/15
Equivalent Query
ENAME
PNAME=“CAD/CAM” (DUR=12 DUR=24) ENAME≠“J. Doe”
×
PROJ ASGEMP
⋈PNO,ENO
Page 16
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/16
EMP
ENAME
ENAME ≠ "J. Doe"
ASGPROJ
PNO,ENAME
PNAME = "CAD/CAM"
PNO
DUR =12DUR=24
PNO,ENO
PNO,ENAME
Restructuring
⋈PNO
⋈ENO
Page 17
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/17
Step 2 – Data Localization
Input: Algebraic query on distributed relations
•Determine which fragments are involved
•Localization program
➡ substitute for each global query its localized query
✦ A localized query is a relational algebra query whose operands are the fragments of relations instead of the relations themselves
✦ We call these operands that are fragments of relations “localization programs” ✓ Union for horizontal fragmentation; Join for vertical fragmentation
✦ Replication is not taken into account in this chapter
➡ Optimize
✦ For each type of fragmentation, use reduction techniques to generate simpler queries
✦ To do so, use appropriate heuristics
Page 18
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/18
ExampleAssume
➡ EMP is fragmented into EMP1,
EMP2, EMP3 as follows:
✦ EMP1= ENO≤“E3”(EMP)
✦ EMP2= “E3”<ENO≤“E6”(EMP)
✦ EMP3= ENO≥“E6”(EMP)
➡ ASG fragmented into ASG1 and
ASG2 as follows:
✦ ASG1= ENO≤“E3”(ASG)
✦ ASG2= ENO>“E3”(ASG)
Replace EMP by (EMP1 EMP2 EMP3)
and ASG by (ASG1 ASG2) in any query
ENAME
DUR=12 DUR=24
PNAME=“CAD/CAM”
ENAME≠“J. DOE”
PROJ
EMP1EMP2 EMP3 ASG1 ASG2
⋈PNO
⋈ENO
Page 19
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/19
Reduction for PHF
•Reduction with selection
➡ Relation R and FR={R1, R2, …, Rw} where Rj=pj(R)
pi(Rj)= if x in R: ¬(pi(x) pj(x))
➡ ExampleSELECT *FROM EMPWHERE ENO="E5"
ENO=“E5”
EMP1 EMP2 EMP3 EMP2
ENO=“E5”
Page 20
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/20
Reduction for PHF
• Reduction with join
➡ Possible if fragmentation is done on join attribute, i.e., the selection attribute used for the fragmentation is the same as the join attribute
➡ Algorithm
✦ Distribute joins over unions
(R1 R2)⋈S (R1⋈S) (R2⋈S)
✦ Eliminate useless joins as follows: Given Ri =pi(R) and Rj = pj
(R)
Ri ⋈Rj = if x in Ri, y in Rj: ¬(pi(x) pj(y))
That is, qualifications of the joined fragments are in
contradiction
Page 21
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/21
Reduction for PHF
•Assume EMP is fragmented as before and
➡ ASG1: ENO ≤ "E3"(ASG)
➡ ASG2: ENO > "E3"(ASG)
•Consider the query
SELECT *FROM EMP,ASGWHERE
EMP.ENO=ASG.ENO
•Distribute join over unions
•Apply the reduction rule
EMP1 EMP2 EMP3 ASG1 ASG2
⋈ENO
EMP1 ASG1 EMP2 ASG2 EMP3 ASG2
⋈ENO ⋈ENO ⋈ENO
Page 22
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/22
Provides Parallellism
EMP3 ASG1EMP2 ASG2EMP1 ASG1
EMP3 ASG2
⋈ENO ⋈ENO ⋈ENO ⋈ENO
Page 23
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/23
Eliminates Unnecessary Work
EMP2 ASG2EMP1 ASG1 EMP3 ASG2
⋈ENO ⋈ENO ⋈ENO
Page 24
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/24
Reduction for VF
• Find useless (not empty) intermediate relations
Relation R defined over attributes A = {A1, ..., An} vertically fragmented as Ri =A'(R) where A' A:
D,K(Ri) is useless if the set of projection attributes D is not in A'
Example: EMP1=ENO,ENAME (EMP); EMP2=ENO,TITLE (EMP)
SELECT ENAMEFROM EMP
EMP1EMP1 EMP2
ENAME
⋈ENO
ENAME
Page 25
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/25
Reduction for DHF•Rule :
➡ Distribute joins over unions
➡ Apply the join reduction for horizontal fragmentation (using the qualification of the primary fragments!)
•Example
ASG1: ASG ⋉ENO EMP1
ASG2: ASG ⋉ENO EMP2
EMP1: TITLE=“Programmer” (EMP)
EMP2: TITLE≠“Programmer” (EMP)
•QuerySELECT *FROM EMP, ASGWHERE ASG.ENO = EMP.ENOAND EMP.TITLE = "Mech. Eng."
Page 26
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/26
Generic query
Selections first
Reduction for DHF
ASG1
TITLE=“Mech. Eng.”
ASG2 EMP1 EMP2
ASG1 ASG2 EMP2
TITLE=“Mech. Eng.”
⋈ENO
⋈ENO
Page 27
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/27
Joins over unions
Reduction for DHF
Elimination of the empty intermediate relations
(left sub-tree)
ASG1 EMP2 EMP2
TITLE=“Mech. Eng.”
ASG2
TITLE=“Mech. Eng.”
ASG2 EMP2
TITLE=“Mech. Eng.”
⋈ENO
⋈ENO ⋈ENO
Page 28
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/28
Reduction for Hybrid Fragmentation•Combine the rules already specified:
➡ Remove empty relations generated by contradicting selections on horizontal fragments;
➡ Remove useless relations generated by projections on vertical fragments;
➡ Distribute joins over unions in order to isolate and remove useless joins.
Page 29
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/29
Reduction for HF
Example
Consider the following hybrid fragmentation:
EMP1= ENO≤"E4" (ENO,ENAME (EMP))
EMP2= ENO>"E4" (ENO,ENAME (EMP))
EMP3= ENO,TITLE (EMP)
and the query
SELECT ENAMEFROM EMPWHERE ENO="E5" EMP1 EMP2
EMP3
ENO=“E5”
ENAME
EMP2
ENO=“E5”
ENAME
⋈ENO