Top Banner
CSE 636 Data Integration SchemaSQL Implementation
13

CSE 636 Data Integration SchemaSQL Implementation.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSE 636 Data Integration SchemaSQL Implementation.

CSE 636Data Integration

SchemaSQL Implementation

Page 2: CSE 636 Data Integration SchemaSQL Implementation.

2

Architecture

DBMSnDBMS1

ResidentSQL Engine

SchemaSQLServer

Federation User

SchemaSQL

QueryFinal

Answer

Answers toqueries Q1…Qn

collected

FinalAnswer

Optimized localquery Q1

Optimized local query Qn

…answer(Q1) answer(Qn)

Final Seriesof SQL Queries

Page 3: CSE 636 Data Integration SchemaSQL Implementation.

3

SchemaSQL Server

• Maintains a Federation System Table (FST)– FST(db-name, rel-name, attr-name)– Names of databases, relations and attributes in the

federation

• Compiles the instantiations of the variables in the query

• Enforces conditions, groupings, aggregations and mergings

Page 4: CSE 636 Data Integration SchemaSQL Implementation.

4

Query Processing

Phase 1• Corresponding to a set of variable declarations

in the FROM clause, create VITs using one or more SQL queries against some local databases and/or the FST– VIT: Variable Instantiation Table whose schema

consists of all the variables in one or more variable declarations in the FROM clause

Phase 2• Rewrite the original SchemaSQL query against

the federation into an “equivalent” query against the set of VIT relations and compute it using the resident SQL server

Fixed Output Schema

Page 5: CSE 636 Data Integration SchemaSQL Implementation.

5

Example

SELECT RelC, C.salFloor FROM univ-C RelC,

univ-C::RelC C,univ-D::salInfo D

WHERE RelC = D.dept ANDC.salFloor > D.technician ANDC.category = ‘technician’

univ-C: cs math univ-D: salInfo

category salFloor

Prof 74K

Assoc Prof 62K

… …

category salFloor

Prof 67K

Assoc Prof 56K

… …

dept Prof Assoc Prof Asst Prof …

cs 72K 65K 78K …

math 65K 54K 69K …

… … … … …

Page 6: CSE 636 Data Integration SchemaSQL Implementation.

6

Example – Phase 1

• VITRelC(RelC):

SELECT rel-name AS RelC FROM FST WHERE db-name = ‘univ-C’

Page 7: CSE 636 Data Integration SchemaSQL Implementation.

7

Example – Phase 1

• VITC(RelC, CsalFloor):

1. SELECT RelC FROM VITRelC

2. If {r1, …, rn} is the answer in step 1, then VITC is computed by the following SQL query to univ-C SELECT ‘r1’ AS RelC, salFloor AS CsalFloor FROM r1

WHERE category = ‘technician’ UNION … UNION SELECT ‘rn’ AS RelC, salFloor AS CsalFloor FROM rn

WHERE category = ‘technician’

Page 8: CSE 636 Data Integration SchemaSQL Implementation.

8

Example – Phase 1

• VITD(Ddept, Dtechnician):

SELECT dept AS Ddept, technician AS Dtechnician FROM salInfo

Page 9: CSE 636 Data Integration SchemaSQL Implementation.

9

Example – Phase 1

VITRelC VITC VITD

RelC

cs

math

Ddept Dtechnician

cs 72K

math 65K

… …

RelC CsalFloor

cs 42K

math 46K

… …

Page 10: CSE 636 Data Integration SchemaSQL Implementation.

10

Example – Phase 2

Joined Variable Instantiation Table (JVIT) is the (natural) join of the VITs generated during Phase 1

1. CREATE VIEW JVIT(RelC, CsalFloor, Ddept, Dtechnician) AS SELECT VITRelC.RelC, VITC.CsalFloor,

VITD.Ddept, VITD.Dtechnician FROM VITRelC, VITC, VITD

WHERE VITRelC.RelC = VITD.Ddept ANDVITRelC.CsalFloor > VITD.Dtechnician ANDVITRelC.RelC = VITC.RelC

2. SELECT RelC, CsalFloorFROM JVIT

Page 11: CSE 636 Data Integration SchemaSQL Implementation.

11

Example – Phase 2 (Aggregation)

Q: Find the average salary floor across all departments for each employee category in database univ-B

SELECT T.category, avg(T.D)FROM univ-B::salInfo D,

univ-B::salInfo TWHERE D <> ‘category’GROUP BY T.category

univ-B: salInfo

category cs math ece …

Prof 72K 65K 78K …

Assoc Prof 65K 54K 69K …

… … … … …

Page 12: CSE 636 Data Integration SchemaSQL Implementation.

12

Example – Phase 2 (Aggregation)

Q: Find the average salary floor across all departments for each employee category in database univ-B

SELECT T.category, avg(T.D)FROM univ-B::salInfo D,

univ-B::salInfo TWHERE D <> ‘category’GROUP BY T.category

Aggregation After Phase 2SELECT Tcategory, avg(TD)FROM JVITGROUP BY Tcategory

Page 13: CSE 636 Data Integration SchemaSQL Implementation.

13

References

1. L. V. S. Lakshmanan, F. Sadri, I. N. Subramanian:SchemaSQL – A Language for Interoperability in Relational Multi-database SystemsVLDB, 1996

2. L. V. S. Lakshmanan, F. Sadri, S. N. Subramanian:SchemaSQL – An Extension to SQL for Multidatabase InteroperabilityTODS, 2001