CSE 636 Data Integration SchemaSQL Implementation
2
Architecture
DBMSnDBMS1
ResidentSQL Engine
SchemaSQLServer
Federation User
SchemaSQL
QueryFinal
Answer
Answers toqueries Q1…Qn
collected
FinalAnswer
Optimized localquery Q1
Optimized local query Qn
…answer(Q1) answer(Qn)
Final Seriesof SQL Queries
3
SchemaSQL Server
• Maintains a Federation System Table (FST)– FST(db-name, rel-name, attr-name)– Names of databases, relations and attributes in the
federation
• Compiles the instantiations of the variables in the query
• Enforces conditions, groupings, aggregations and mergings
4
Query Processing
Phase 1• Corresponding to a set of variable declarations
in the FROM clause, create VITs using one or more SQL queries against some local databases and/or the FST– VIT: Variable Instantiation Table whose schema
consists of all the variables in one or more variable declarations in the FROM clause
Phase 2• Rewrite the original SchemaSQL query against
the federation into an “equivalent” query against the set of VIT relations and compute it using the resident SQL server
Fixed Output Schema
5
Example
SELECT RelC, C.salFloor FROM univ-C RelC,
univ-C::RelC C,univ-D::salInfo D
WHERE RelC = D.dept ANDC.salFloor > D.technician ANDC.category = ‘technician’
univ-C: cs math univ-D: salInfo
category salFloor
Prof 74K
Assoc Prof 62K
… …
category salFloor
Prof 67K
Assoc Prof 56K
… …
dept Prof Assoc Prof Asst Prof …
cs 72K 65K 78K …
math 65K 54K 69K …
… … … … …
7
Example – Phase 1
• VITC(RelC, CsalFloor):
1. SELECT RelC FROM VITRelC
2. If {r1, …, rn} is the answer in step 1, then VITC is computed by the following SQL query to univ-C SELECT ‘r1’ AS RelC, salFloor AS CsalFloor FROM r1
WHERE category = ‘technician’ UNION … UNION SELECT ‘rn’ AS RelC, salFloor AS CsalFloor FROM rn
WHERE category = ‘technician’
8
Example – Phase 1
• VITD(Ddept, Dtechnician):
SELECT dept AS Ddept, technician AS Dtechnician FROM salInfo
9
Example – Phase 1
VITRelC VITC VITD
RelC
cs
math
…
Ddept Dtechnician
cs 72K
math 65K
… …
RelC CsalFloor
cs 42K
math 46K
… …
10
Example – Phase 2
Joined Variable Instantiation Table (JVIT) is the (natural) join of the VITs generated during Phase 1
1. CREATE VIEW JVIT(RelC, CsalFloor, Ddept, Dtechnician) AS SELECT VITRelC.RelC, VITC.CsalFloor,
VITD.Ddept, VITD.Dtechnician FROM VITRelC, VITC, VITD
WHERE VITRelC.RelC = VITD.Ddept ANDVITRelC.CsalFloor > VITD.Dtechnician ANDVITRelC.RelC = VITC.RelC
2. SELECT RelC, CsalFloorFROM JVIT
11
Example – Phase 2 (Aggregation)
Q: Find the average salary floor across all departments for each employee category in database univ-B
SELECT T.category, avg(T.D)FROM univ-B::salInfo D,
univ-B::salInfo TWHERE D <> ‘category’GROUP BY T.category
univ-B: salInfo
category cs math ece …
Prof 72K 65K 78K …
Assoc Prof 65K 54K 69K …
… … … … …
12
Example – Phase 2 (Aggregation)
Q: Find the average salary floor across all departments for each employee category in database univ-B
SELECT T.category, avg(T.D)FROM univ-B::salInfo D,
univ-B::salInfo TWHERE D <> ‘category’GROUP BY T.category
Aggregation After Phase 2SELECT Tcategory, avg(TD)FROM JVITGROUP BY Tcategory