G. Papastefanatos 1 , P. Vassiliadis 2 , A. Simitsis 3 , Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr (2) University of Ioannina, Ioannina, Hellas (Greece) [email protected](3) HP Labs, Palo Alto, California, USA [email protected]Design Metrics for Data Warehouse Evolution
31
Embed
G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
G. Papastefanatos1, P. Vassiliadis2, A. Simitsis3, Y. Vassiliou1
(1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr
(2) University of Ioannina, Ioannina, Hellas (Greece) [email protected]
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
ER'08, Barcelona, October 2008 3
Outline
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
ER'08, Barcelona, October 2008 4
Motivation
WWW Act1
Act2
Act3
Act4
Act5
Data warehouses are evolving environments, e.g.:–A dimension is removed or renamed–The structure of a dimension table is updated–A fact table is completely decoupled from a dimension–The measures of a fact table change–An ETL source is modified, etc.
ER'08, Barcelona, October 2008 5
Evolution Effects
• SW and data artifacts around the warehouse (e.g., ETL activities, materialized views, reports) are affected:– Syntactically – i.e., become invalid– Semantically – i.e., must conform to the new source database
semantics• Adaptation to new semantics
– time-consuming task– treated in most of the cases manually by the
administrators/developers• Evolution-driven design is missing
ER'08, Barcelona, October 2008 6
We would like to know…
• Can we measure and quantify in a principled way the vulnerability of certain parts of a data warehouse environment and find these constructs that are most sensitive to evolution?
• Can we predict and quantify the impact of a change towards the rest system?
• What are the “right” measures for evaluating the quality of the design of a data warehouse, with respect to its evolution capabilities?
ER'08, Barcelona, October 2008 7
Outline
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
ER'08, Barcelona, October 2008 8
Data Warehouse Schema Evolution Our approach
Mechanism for performing what-if analysis for potential changes of database configurations
Graph based representation of database constructs (i.e., relations, views, constraints, queries)
Annotation of graph with rules for adapting queries to database schema evolution
Evolving databases
QueriesDatabase Schema
Graph-based modeling
for uniform representation
Metrics for Evaluating Evolution
Design
Evolving applications
Rules for Handling Evolution
ER'08, Barcelona, October 2008 9
Graph based representation
map-select
map-select
S
Sgroup by
from
=whereop2
op1
GB group by
W.EMP#.FK
op
op
SS
S SS
S
EMP.PK
op
V
WORKS
EMP
SalNameEmp#
Emp# Hours Proj#
HOURS
Emp#
Module
Module
Module
Module
Qfrom
map-select map-selectSUMT_HOURS
op2
op1>=
50K
AND
wherewhere
S
Emp#
S
from
map-select
SELECT Emp#, SUM(Hours) as T_HOURSFROM VGROUP BY Emp#
CREATE VIEW V AS SELECT Emp#, HoursFROM EMP E, WORKS WWHERE E.Emp# = W.Emp#AND E.Sal >= 50K
WORKS (Emp#, Proj#,Hours)
EMP(Emp#, Name, Sal)
ER'08, Barcelona, October 2008 10
Graph Annotation with rules
We annotate
For reacting toW
ith rule
Set of graph elements· Query Node: Q1· Attribute Node: EMP.E_TITLE· View Node: Emps_Prjs, etc.
Set of rules· Propagate· Block· Prompt
Set of evolution events· Add Attribute· Delete Attribute· Rename View, etc.
1
3
2
ER'08, Barcelona, October 2008 11
Graph Adaptation
Annotated Query GraphEvent
Add attribute Phone to relation EMP
Transformed Query Graph
Q
Name
EID
Name
EID
S
S
EMPS
S
map-select
map-select …
ON attribute addition TO EMP THEN propagate
Q: SELECT EID, Name FROM EMP
Q: SELECT EID, Name, Phone FROM EMP
Q
Name
EID
Name
EID
S
S
EMPS
S
map-select
map-select …
ON attribute addition TO EMP THEN propagate
Phone
S
Phone
S
map-select
ER'08, Barcelona, October 2008 12
Outline
• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions
ER'08, Barcelona, October 2008 13
Simple Metrics
Simple: in-degree, out-degree, degreeEMP.Emp# is more
“important” than EMP.SAL, w.r.t. how many nodes depend directly on it
map-select
map-select
S
S
from
=whereop2
op1
W.EMP#.FK
op
op
SS
S SS
S
EMP.PK
op
V
WORKS
EMP
SalNameEmp#
Emp# Hours Proj#
HOURS
Emp#
Module
Module
Module
map-select
op2
op1>=
50K
AND
wherewhere
from
ER'08, Barcelona, October 2008 14
Transitive Metrics
Transitive: in-degree, out-degree, degree
Variant with a view + query is more “complicated” wrt how many nodes are involved in the propagation of EMP.Emp# towards the end
map-select
map-select
S
Sgroup by
from
=whereop2
op1
GB group by
W.EMP#.FK
op
op
SS
S SS
S
EMP.PK
op
V
WORKS
EMP
SalNameEmp#
Emp# Hours Proj#
HOURS
Emp#
Module
Module
Module
Module
Qfrom
map-select map-selectSUMT_HOURS
op2
op1>=
50K
AND
wherewhere
S
Emp#
S
from
map-select
ER'08, Barcelona, October 2008 15
Zoomed-out degrees
41V
WORKS
EMP
Q
3
3
• Only top-level nodes are retained• Only one edge between modules is retained weighted
with the number of edges suppressed
Simple degreesTransitive degrees
ER'08, Barcelona, October 2008 16
Entropy-based metrics
P(v|yk) =
Vy
i
k
i
yvpaths
yvpaths
),(
),(, for all nodes yi V.
V
WORKS
EMP
Q
Probability that a node v is affected by an event occurring on another node yi :