Top Banner
Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation ghav Kapoor, Boris Glavic Illinois Institute of Technology Venkatesh Radhakrishnan Facebook Xing Niu Illinois Institute of Technology [email protected]
28

Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Dec 23, 2015

Download

Documents

Colleen Nichols
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Interoperability for Provenance-aware Databases

using PROV and JSON

Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy

Oracle Corporation

Raghav Kapoor, Boris Glavic

Illinois Institute of Technology

Venkatesh Radhakrishnan

Facebook

Xing NiuIllinois Institute of Technology

[email protected]

Page 2: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Outline

① Introduction

② Related work

③ Overview

④ Export and Import

⑤ Experimental Results

⑥ Conclusions and Future Work

Page 3: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Introduction

• The PROV standards A standardized, extensible representation of provenance

graphs Exchange of provenance information between systems

• Provenance-aware DBMS Computing the provenance of database operations E.g., Perm[1], GProM [2], DBNotes[3], Orchestra[4],

LogicBlox[5]

3

[1] B. Glavic, R. J. Miller, and G. Alonso. Using SQL for Efficient Generation and Querying of Provenance Information. In In Search of Elegance in the Theory and Practice of Computation, pages 291–320. Springer, 2013..[2] YB. Arab, D. Gawlick, V. Radhakrishnan, H. Guo, and B. Glavic. A generic provenance middleware for database queries, updates, and transactions. In TaPP, 2014.[3] D. Bhagwat, L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. An Annotation Management System for Relational Databases. VLDB Journal, 14(4):373–396, 2005.[4] G. Karvounarakis, T. J. Green, Z. G. Ives, and V. Tannen. Collaborative data sharing via update exchange and provenance. TODS, 38(3):19, 2013.[5] Huang, S., Green, T., Loo, B.: Datalog and emerging applications: an interactive tutorial. In: SIGMOD, pp. 1213–1216 (2011)

Page 4: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

4

Introduction

• Example: extracting demographic information from tweets

Page 5: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

5

Introduction

• Problem:No relational database system supports tracking of

database provenance as well as import and export of provenance in PROV

Not capable of exporting provenance into standardized formats

• E.g., GProM:Essentially produces wasDerivedFrom edges

• Between the output tuples of a query Q and its inputs.

However, not available as PROV graphs• No way to track the derivation back to non-database entities

Page 6: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

6

Introduction

• GProM System

Computes provenance for database operations• Queries, updates, transactions

Using SQL language extensions• e.g., PROVENANCE OF (SELECT ...)

Page 7: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

7

Introduction

• Example of GProM in actionThe result of PROVENANCE OF for query QEach tuple in this result represents one wasDerivedFrom

assertion

• E.g., tuple to1 was derived from tuple t1

Page 8: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

8

Introduction

• Goal: make databases interoperable with other provenance systems

• Approach:Export and import of provenance

• PROV-JSON

Propagation of imported provenance Implemented in GProM using SQL

Page 9: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Outline

① Introduction

② Related work

③ Overview

④ Export and Import

⑤ Experimental Results

⑥ Conclusion and future work

Page 10: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

10

Related Work

• How to integrate provenance graphs by identifying common elements? [6]

• Address interoperability problem between databases and other provenance-aware systems through– Common model for both types of provenance [7][8][9]– Monitoring database access to link database provenance with other

provenance systems [10][11]

[6] A. Gehani and D. Tariq. Provenance integration. In TaPP, 2014.[7] U. Acar, P. Buneman, J. Cheney, J. van den Bussche, N. Kwasnikowska, and S. Vansummeren. A graph model of data and workflowprovenance. In TaPP, 2010.[8] Y. Amsterdamer, S. Davidson, D. Deutch, T. Milo, J. Stoyanovich, and V. Tannen. Putting Lipstick on Pig: Enabling Database-style Workflow Provenance. PVLDB, 5(4):346–357, 2011.[9] D. Deutch, Y. Moskovitch, and V. Tannen. A provenance framework for data-dependent process analysis. PVLDB, 7(6), 2014.[10] F. Chirigati and J. Freire. Towards integrating workflow and database provenance. In IPAW, pages 11–23, 2012.[11] Q. Pham, T. Malik, B. Glavic, and I. Foster. LDV: Light-weight Database Virtualization. In ICDE, pages 1179–1190, 2015.

Page 11: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Outline

① Introduction

② Related works

③ Overview

④ Export and Import

⑤ Experimental Results

⑥ Conclusion and future work

Page 12: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

12

Overview

• We introduce techniques for exporting database provenance as PROV documents

• Importing PROV graphs alongside data• Linking outputs of SQL operations to imported provenance

for their inputs– Implementation in GProM offloads generation of PROV documents

to backend database• SQL and string concatenation

Page 13: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Outline

① Introduction

② Related works

③ Overview

④ Export and Import

⑤ Experimental Results

⑥ Conclusion and future work

Page 14: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

14

Export and Import

• Export– Added TRANSLATE AS clause

• e.g., PROVENANCE OF (SELECT ...) TRANSLATE AS …

– Construct PROV-JSON document from database provenance① Running several projections over the provenance

computation– E.g., ‘”_:wgb\(’ || F0.STATE || ‘|’ || F0.”AVG(AGE)” || ‘\)’…

② Uses aggregation to concatenate all snippets of a certain type– E.g., entity nodes, wasGeneratedBy edges, allUsed edges

③ Uses string concatenation to create final document

Page 15: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

15

Export and Import

• Example: part of the final PROV document

Red dotted lines in DB

Page 16: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

16

Export and Import

• Import Import PROV for an existing relation Provide a language construct IMPORT PROV FOR ... Import available PROV graphs for imported tuples and store

them alongside the dataAdd three columns to each table to store imported

provenance• prov doc: store a PROV-JSON snippet representing its

provenance• Prov_eid: indicates which of the entities in this snippet

represents the imported tuple• Prov_time: stores a timestamp as of the time when the tuple

was imported

Page 17: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

17

Export and Import

• Import: exampleRelation user with imported provenanceAttribute value d is the previous PROV graph without

database activities and entities

Page 18: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

18

Export and Import

• Using Imported Provenance During Export Include the imported provenance as bundles in the

generated PROV graph• Bundles [13] enable nesting of PROV graphs within

PROV graphs, treating a nested graph as a new entity.Connect the entities representing input tuples in the

imported provenance to the query activity and output tuple entities

[13] P. Missier, K. Belhajjame, and J. Cheney. The W3C PROV family of specifications for modelling provenance metadata. In EDBT, pages 773–776, 2013.

Page 19: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

19

Export and Import

• Example of Bundles:

Page 20: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

20

Export and Import

• Handling Updates If a tuple is modified, that should be reflected when

provenance is exported• E.g., by running an SQL UPDATE statement

• Example Assume the user has run an update to correct tuple t1’s age value

(setting age to 70) before running the query

Page 21: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

21

Export and Import

• Challenge How to track the provenance of updates under

transactional semantics

• SolutionGProM using the novel concept of reenactment

queries• User can request the provenance of an past update,

transaction, or set of updates executed within a given time interval

• Construct PROV document using provenance for updates computed on-the-fly

Page 22: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Outline

① Introduction

② Related works

③ Overview

④ Export and Import

⑤ Experimental Results

⑥ Conclusion and future work

Page 23: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

23

Experimental Results

• TPC-H [14] benchmark datasets Scale factor from 0.01 to 10 (10MB up to 10GB size)

• Run on a machine with 2 x AMD Opteron 3.3Ghz Processors 128GB RAM 4 x 1 TB 7.2K RPM disks configured in RAID 5

• Queries Provenance of a three way join between relations customer,

order, and nation With additional selection conditions to control selectivity (and,

thus, the size of the exported PROV-JSON document).

[14] TPC. TPC-H Benchmark Specification, 2009.

Page 24: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

24

Experimental Results

1 GB

10 GB

Page 25: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

Outline

① Introduction

② Related works

③ Overview

④ Export and Import

⑤ Experimental Results

⑥ Conclusions and Future Work

Page 26: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

26

Conclusions and Future Work

Conclusions• Integrated import and export of provenance represented as

PROV-JSON into/from provenance-aware databases• Construct PROV graphs on-the-fly using SQL• Connect database provenance to imported PROV data

Future Work• Full implementation for updates• Automatic storage management (e.g., deduplication) for

imported provenance• Automatic cross-referencing

Page 27: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

27

Questions

• My Webpage– http://www.cs.iit.edu/~dbgroup/people/xniu.php

• Our Group’s Webpage– http://cs.iit.edu/~dbgroup/research/index.html

• GProM– http://www.cs.iit.edu/~

dbgroup/research/gprom.php

Page 28: Interoperability for Provenance-aware Databases using PROV and JSON Dieter Gawlick, Zhen Hua Liu, Vasudha Krishnaswamy Oracle Corporation Raghav Kapoor,

28

Others

• Provenance querying• Provenance for JSON