Perm Processing Provenance and Data on the Same Data Model through Query Rewriting Boris Glavic Database Technology Group Department of Informatics University of Zurich [email protected]Zur Anzeige wird der Dekompressor „“ benötigt. Gustavo Alonso Systems Group Department of Computer Science ETH Zurich [email protected]
38
Embed
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting
Data provenance is information that describes how a given data item was produced. The provenance includes source and intermediate data as well as the transformations involved in producing the concrete data item. In the context of a relational databases, the source and intermediate data items are relations, tuples and attribute values. The transformations are SQL queries and/or functions on the relational data items. Existing approaches capture provenance information by extending the underlying data model. This has the intrinsic disadvantage that the provenance must be stored and accessed using a different model than the actual data. In this paper, we present an alternative approach that uses query rewriting to annotate result tuples with provenance information. The rewritten query and its result use the same model and can, thus, be queried, stored and optimized using standard relational database techniques. In the paper we formalize the query rewriting procedures, prove their correctness, and evaluate a first implementation of the ideas using PostgreSQL. As the experiments indicate, our approach efficiently provides provenance information inducing only a small overhead on normal operations.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
The problem of computing this type of provenance has been solved before See e.g. [Cui, Widom ICDE ‘00]
but... Non-relational representation of
provenance data Separation of provenance and “normal”
data Non-relational computation of
provenance data
1. Introduction
6
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Perm Provenance Extension of the Relational
Model Provenance Management System
“Pure” Relational representation of provenance
Query result tuples and provenance tuples are represented as a single relation
7
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Benefits: Provenance can be... ... Stored in standard DBMS ... Queried using SQL ... Directly interpreted by a user Direct association between provenance
and “normal data”
8
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Provenance Computation -> Use query rewrite
Given query q Generate query q+
Computes the provenance of all result tuples from q
9
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
1. Introduction
Benefits: Rewritten query is expressed in
relational algebra Can be optimized and executed by a R-
DBMS E.g. can be stored as a view Used as a subquery
Rewrite method basics Use algebra representation of the query Replace every algebra operator with an
algebra statement that propagates provenance alongside with the original results
-> need a rewrite rule for each relational algebra operator
20
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite process
op3
op1
op2
21
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite process
op3
op1
op2 op3
op1b
op2
op1a
op1c
Apply Rewrite rule
22
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite process
op3
op1b
op2
op1a
op1cApply Rewrite rules
23
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite rules notations:
Rewritten statement (query)
Provenance attributes
€
T +
€
P(T + )
24
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite rules example:SELECT agg, GFROM TGROUP BY G
SELECT agg, G, P(T)FROM
(SELECT agg, G FROM T GROUP BY G) AS aggLEFT OUTER JOIN(SELECT G AS G’, P(T) FROM T ) AS provON (G = G’)
+
25
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
Rewrite rules example:SELECT sum(revenue) AS sum, shopFROM salesGROUP BY shop
shop month revenue
Migros Jan 100
Migros Feb 10
Migros Mar 10
Coop Jan 25
Coop Feb 25
salessum shop
120 Migros
50 Coop
result
26
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
3. Query Rewriting for Provenance Computation
SELECT sum, shop, pShop, pMonth, pRevenueFROM
(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)
sum shop pShop pMonth pRevenue
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
27
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
SELECT sum, shop, pShop, pMonth, pRevenueFROM
(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)
3. Query Rewriting for Provenance Computation
sum shop pShop pMonth pRevenue
120 Migros Migros Jan 100
120 Migros Migros Feb 10
120 Migros Migros Mar 10
50 Coop Coop Jan 25
50 Coop Coop Feb 25
+
28
Zur Anzeige wird der QuickTime™ Dekompressor „“
benötigt.
SELECT sum, shop, pShop, pMonth, pRevenueFROM
(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)