Top Banner
Perm Processing Provenance and Data on the Same Data Model through Query Rewriting Boris Glavic Database Technology Group Department of Informatics University of Zurich [email protected] Zur Anzeige wird der Dekompressor „“ benötigt. Gustavo Alonso Systems Group Department of Computer Science ETH Zurich [email protected]
38

ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

Jun 19, 2015

Download

Science

Boris Glavic

Data provenance is information that describes how a given data item was produced. The provenance includes source and intermediate data as well as the transformations involved in producing the concrete data item. In the context of a relational databases, the source and intermediate data
items are relations, tuples and attribute values. The transformations are SQL queries and/or functions on the relational data items. Existing approaches capture provenance information by extending the underlying data model. This has the intrinsic disadvantage that the provenance must be stored and accessed using a different model than the actual data. In this paper, we present an alternative approach that uses query rewriting to annotate result tuples with provenance information. The rewritten query and its result use the same model and can, thus, be queried, stored and optimized using standard relational database techniques. In the paper we formalize the query rewriting procedures, prove their correctness, and evaluate a first implementation of the ideas using PostgreSQL. As the experiments indicate, our approach efficiently provides provenance information inducing only a small overhead on normal operations.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

Perm Processing Provenance and Data on the

Same Data Model through Query Rewriting

Boris Glavic

Database Technology Group

Department of Informatics University of Zurich

[email protected]

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Gustavo Alonso

Systems GroupDepartment of Computer

Science ETH Zurich

[email protected]

Page 2: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

2

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Overview

1. Introduction to Perm2. The Perm Provenance

Representation3. Query Rewriting for Provenance

Computation4. Perm Implementation5. Experimental Results6. Conclusion

Page 3: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

3

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

1. Introduction

Query Transformation

Data items: Result relation

Data items: Base relations

Relational Provenance

Page 4: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

4

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

1. Introduction

Query

Which input data item(s) influenced which output data item(s)? Granularity

Tuple Attribute Value ...

Contribution semantics Influence (Why) Copy (Where) ...

Page 5: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

5

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

The problem of computing this type of provenance has been solved before See e.g. [Cui, Widom ICDE ‘00]

but... Non-relational representation of

provenance data Separation of provenance and “normal”

data Non-relational computation of

provenance data

1. Introduction

Page 6: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

6

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

1. Introduction

Perm Provenance Extension of the Relational

Model Provenance Management System

“Pure” Relational representation of provenance

Query result tuples and provenance tuples are represented as a single relation

Page 7: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

7

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

1. Introduction

Benefits: Provenance can be... ... Stored in standard DBMS ... Queried using SQL ... Directly interpreted by a user Direct association between provenance

and “normal data”

Page 8: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

8

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

1. Introduction

Provenance Computation -> Use query rewrite

Given query q Generate query q+

Computes the provenance of all result tuples from q

Page 9: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

9

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

1. Introduction

Benefits: Rewritten query is expressed in

relational algebra Can be optimized and executed by a R-

DBMS E.g. can be stored as a view Used as a subquery

Page 10: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

10

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Overview

1. Introduction to Perm2. The Perm Provenance

Representation3. Query Rewriting for Provenance

Computation4. Perm Implementation5. Results6. Conclusion

Page 11: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

11

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

2. The Perm Approach

sName

itemID

Migros 1

Migros 2

Migros 2

Coop 3

Coop 3

id price

1 100

2 10

3 25

itemssales

Page 12: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

12

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

2. The Perm Approach

Compute the sum of sales for each shop

SELECT sName, sum(price) FROM sales, items WHERE itemId = id GROUP BY sName;

Page 13: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

13

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

2. The Perm Approach

sName

itemID

Migros 1

Migros 2

Migros 2

Coop 3

Coop 3

id price

1 100

2 10

3 25

itemssales

name Sum(price)

Migros 120

Coop 50

result

Page 14: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

14

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

2. The Perm Approach

sName

itemID

Migros 1

Migros 2

Migros 2

Coop 3

Coop 3

id price

1 100

2 10

3 25

itemssales

name Sum(price)

Migros 120

Coop 50

result

Page 15: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

15

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

2. The Perm Approach

sName

itemID

Migros 1

Migros 2

Migros 2

Coop 3

Coop 3

id price

1 100

2 10

3 25

itemssales

name Sum(price)

Migros 120

Coop 50

result

Page 16: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

16

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

2. The Perm Approach

Desired result format:

OriginalAttributes

Relation 1 Attributes

Relation n Attributes

Page 17: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

17

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

2. The Perm Approach

name sum(price) P(sName)

P(itemId)

P(id) P(price)

Migros

120 Migros 1 1 100

Migros

120 Migros 2 2 10

Migros

120 Migros 2 2 10

Coop 10 Coop 3 3 25

Coop 10 Coop 3 3 25

Original result sales items

Page 18: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

18

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Overview

1. Introduction to Perm2. The Perm Provenance

Representation3. Query Rewriting for Provenance

Computation4. Perm Implementation5. Results6. Conclusion

Page 19: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

19

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

3. Query Rewriting for Provenance Computation

Rewrite method basics Use algebra representation of the query Replace every algebra operator with an

algebra statement that propagates provenance alongside with the original results

-> need a rewrite rule for each relational algebra operator

Page 20: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

20

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

3. Query Rewriting for Provenance Computation

Rewrite process

op3

op1

op2

Page 21: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

21

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

3. Query Rewriting for Provenance Computation

Rewrite process

op3

op1

op2 op3

op1b

op2

op1a

op1c

Apply Rewrite rule

Page 22: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

22

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

3. Query Rewriting for Provenance Computation

Rewrite process

op3

op1b

op2

op1a

op1cApply Rewrite rules

Page 23: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

23

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

3. Query Rewriting for Provenance Computation

Rewrite rules notations:

Rewritten statement (query)

Provenance attributes

T +

P(T + )

Page 24: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

24

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

3. Query Rewriting for Provenance Computation

Rewrite rules example:SELECT agg, GFROM TGROUP BY G

SELECT agg, G, P(T)FROM

(SELECT agg, G FROM T GROUP BY G) AS aggLEFT OUTER JOIN(SELECT G AS G’, P(T) FROM T ) AS provON (G = G’)

+

Page 25: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

25

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

3. Query Rewriting for Provenance Computation

Rewrite rules example:SELECT sum(revenue) AS sum, shopFROM salesGROUP BY shop

shop month revenue

Migros Jan 100

Migros Feb 10

Migros Mar 10

Coop Jan 25

Coop Feb 25

salessum shop

120 Migros

50 Coop

result

Page 26: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

26

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

3. Query Rewriting for Provenance Computation

SELECT sum, shop, pShop, pMonth, pRevenueFROM

(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)

sum shop pShop pMonth pRevenue

120 Migros Migros Jan 100

120 Migros Migros Feb 10

120 Migros Migros Mar 10

50 Coop Coop Jan 25

50 Coop Coop Feb 25

+

Page 27: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

27

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

SELECT sum, shop, pShop, pMonth, pRevenueFROM

(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)

3. Query Rewriting for Provenance Computation

sum shop pShop pMonth pRevenue

120 Migros Migros Jan 100

120 Migros Migros Feb 10

120 Migros Migros Mar 10

50 Coop Coop Jan 25

50 Coop Coop Feb 25

+

Page 28: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

28

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

SELECT sum, shop, pShop, pMonth, pRevenueFROM

(SELECT sum(revenue) AS sum, shop FROM sales GROUP BY shop) AS aggLEFT OUTER JOIN(SELECT shop AS shop’, pShop, pMonth, pRevenue FROM sales ) AS provON (shop = shop’)

3. Query Rewriting for Provenance Computation

sum shop pShop pMonth pRevenue

120 Migros Migros Jan 100

120 Migros Migros Feb 10

120 Migros Migros Mar 10

50 Coop Coop Jan 25

50 Coop Coop Feb 25

+

Page 29: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

29

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Overview

1. Introduction to Perm2. The Perm Provenance

Representation3. Query Rewriting for Provenance

Computation4. Perm Implementation5. Results6. Conclusion

Page 30: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

30

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

4. Perm Implementation

Extension of PostgreSQL DBMS Implemented inside of PostgreSQL

-> does not affect client applications Extended SQL language Perm module

Implements algebraic rewrite rules as query rewrites

Page 31: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

31

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

4. Perm Implementation

SQL-PLE: SQL extension SELECT PROVENANCE ...

Nice benefits: CREATE VIEW x AS SELECT

PROVENANCE ... SELECT PROVENANCE ... INTO x ... SELECT ... FROM (SELECT

PROVENANCE ...

Page 32: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

32

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

4. Perm Implementation

Perm Architecture

Parser & Analyser

Rewriter

Perm Module

Planner

Executor

SELECT PROVENANCE ....

Q =...

Q’+ =...

MergeJoin (...

Q’ =...

Page 33: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

33

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Overview

1. Introduction to Perm2. The Perm Provenance

Representation3. Query Rewriting for Provenance

Computation4. Perm Implementation5. Experimental Results6. Conclusion

Page 34: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

34

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

5. Experimental Results

TPC-H benchmark

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Page 35: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

35

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Overview

1. Introduction to Perm2. The Perm Provenance

Representation3. Query Rewriting for Provenance

Computation4. Perm Implementation5. Experimental Results6. Conclusion

Page 36: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

36

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

6. Conclusion

Benefits Compute provenance for SQL Full SQL query power for provenance

data Lazy or eager computation Reuse existing database technology Supports external provenance

Page 37: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

37

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

6. Conclusion

Future work Physical operators for more efficient

provenance computation Storage compression Include transformation provenance Support different contribution semantics Support various granularities

Page 38: ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model through Query Rewriting

38

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Questions

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.

Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt.

Zur Anzeige wird der QuickTime™ Dekompressor „“ benötigt.

Zur Anzeige wird der QuickTime™ Dekompressor „“

benötigt.