Top Banner
Introduction Approach Transformation Provenance Implementation Results Conclusions TRAMP: Understanding the Behaviour of Schema Mappings through Provenance Boris Glavic 1 Gustavo Alonso 2 Ren´ ee J. Miller 1 Laura M. Haas 3 1 University of Toronto 2 ETH Zurich 3 IBM Almaden Research Center VLDB 2010, September 16, 2010
62

2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Jun 19, 2015

Download

Science

Boris Glavic

Though partially automated, developing schema mappings remains a complex and potentially error-prone task. In this paper, we present TRAMP (TRAnsformation Mapping Provenance), an extensive suite of tools supporting the debugging and tracing of schema mappings and transformation queries. TRAMP combines and extends data provenance with two novel notions, transformation provenance and mapping provenance, to explain the relationship between transformed data and those transformations and mappings that produced that data. In addition we provide query support for transformations, data, and all forms of provenance. We formally define transformation and mapping provenance, present an efficient implementation of both forms of provenance, and evaluate the resulting system through extensive experiments.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

TRAMP: Understanding the Behaviour of SchemaMappings through Provenance

Boris Glavic1 Gustavo Alonso2

Renee J. Miller1 Laura M. Haas3

1University of Toronto

2ETH Zurich

3IBM Almaden Research Center

VLDB 2010, September 16, 2010

Page 2: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Outline

1 Introduction

2 Approach

3 Transformation Provenance

4 Implementation

5 Results

6 Conclusions

TRAMP

Page 3: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

What is TRAMP?

TRAMP: Transformation Mapping Provenance

Novel “holistic” approach to help users to understand schemamappings

Data ExchangeData Integration

Provides query language access for

Mapping scenariosProvenanceData

Slide 1 of 27 Boris Glavic TRAMP

Page 4: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Data Exchange

Problem Statement

Given a source and a target schema

How to map data from the source to the target?

General Approach

1 Find correspondences between schema elements

2 Generate schema mappings from correspondences andschema constraints

3 Generate implementing transformations

4 Execute transformations

Slide 2 of 27 Boris Glavic TRAMP

Page 5: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Data Exchange Process: 1) Find Correspondences

Correspondence

Attribute X represents the same information as attribute Y

Generated by automatic matcher or user

Example

Employee Name Address

Address Id City Street

Person Name LivesAt Gender

Source Target

Slide 3 of 27 Boris Glavic TRAMP

Page 6: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Data Exchange Process: 1) Find Correspondences

Correspondence

Attribute X represents the same information as attribute Y

Generated by automatic matcher or user

Example

Employee Name Address

Address Id City Street

Person Name LivesAt Gender

Source Target

Slide 3 of 27 Boris Glavic TRAMP

Page 7: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Data Exchange Process: 2) Generate Schema Mappings

Schema Mapping

Declarative Constraints that model relationships betweenschemas (s-t tgd or source-to-target tuple-generatingdependencies)

Generated from correspondences and schema constraints

Example

For all employees with associated addresses exists a person withthe Name of the employee and the City of the address stored in theLivesAt attribute.

Slide 3 of 27 Boris Glavic TRAMP

Page 8: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Data Exchange Process: 2) Generate Schema Mappings

Schema Mapping

Declarative Constraints that model relationships betweenschemas (s-t tgd or source-to-target tuple-generatingdependencies)

Generated from correspondences and schema constraints

Example

M1 : ∀a, b, c , d : Employee(a, b) ∧ Address(b, c , d) ⇒ ∃f :person(a, c , f )

Slide 3 of 27 Boris Glavic TRAMP

Page 9: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Data Exchange Process: 2) Generate Schema Mappings

Schema Mapping

Declarative Constraints that model relationships betweenschemas (s-t tgd or source-to-target tuple-generatingdependencies)

Generated from correspondences and schema constraints

Example

M1 : ∀a, b, c , d : Employee(a, b) ∧ Address(b, c , d) ⇒ ∃f :person(a, c , f )M2 : ∀a, b : Employee(a, b) ⇒ ∃c , d : person(a, c , d)

Slide 3 of 27 Boris Glavic TRAMP

Page 10: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Data Exchange Process: 3) Generate ImplementingTransformations

Implementing Transformation

Schema mappings do not specify full target instance

Need executable transformation

Generated from schema mappings (XQuery, SQL, . . . )

Many-to-Many (Mappings-Transformations)

Example

SELECT Name , city AS LivesAt , NULL AS Gender

FROM Employee e JOIN Address a ON (e.Address = a.Id)

UNION

SELECT Name , NULL AS LivesAt , NULL AS Gender

FROM Employee e;

Slide 3 of 27 Boris Glavic TRAMP

Page 11: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Data Exchange Process: 4) Execute

SourceEmployee

Name AddressGerd 2

Nandy NULL

AddressId City Street1 Prag Krutz2 Aachen Pond

SELECT Name,city AS LivesAt,NULL AS GenderFROMEmployee eJOIN Address aON (e.Address = a.Id)UNIONSELECT Name,NULL AS LivesAt,NULL AS GenderFROM Employee e;

→Target

PersonName LivesAt GenderGerd Aachen NULLGerd NULL NULL

Nandy NULL NULL

Slide 3 of 27 Boris Glavic TRAMP

Page 12: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Understanding and Debugging Schema Mappings

Complex error-prone process

Many sources of error:

Faulty source dataIncorrect correspondencesIncorrect schema mappingsIncorrect transformations

Hard to trace error source

How to help the user?

Provide information that aids in debugging

Allow for combination and filtering ⇒ Query language

Slide 4 of 27 Boris Glavic TRAMP

Page 13: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Understanding and Debugging Schema Mappings

Complex error-prone process

Many sources of error:

Faulty source dataIncorrect correspondencesIncorrect schema mappingsIncorrect transformations

Hard to trace error source

How to help the user?

Provide information that aids in debugging

Allow for combination and filtering ⇒ Query language

Slide 4 of 27 Boris Glavic TRAMP

Page 14: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Information of Interest

1 Where is data derived from?

2 Which mapping created which data?

3 Which part of a transformation created which data?

4 Mapping Scenario (Interrelationships)

Slide 5 of 27 Boris Glavic TRAMP

Page 15: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Information of Interest

1 Where is data derived from?

2 Which mapping created which data?

3 Which part of a transformation created which data?

4 Mapping Scenario (Interrelationships)

Slide 5 of 27 Boris Glavic TRAMP

Page 16: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Information of Interest

1 Where is data derived from?

2 Which mapping created which data?

3 Which part of a transformation created which data?

4 Mapping Scenario (Interrelationships)

Slide 5 of 27 Boris Glavic TRAMP

Page 17: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Information of Interest

1 Where is data derived from?

2 Which mapping created which data?

3 Which part of a transformation created which data?

4 Mapping Scenario (Interrelationships)

Slide 5 of 27 Boris Glavic TRAMP

Page 18: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Where is Data Derived from?

ExamplePerson

Name LivesAt GenderGerd Aachen NULLGerd NULL NULL

Nandy NULL NULL

↑SELECT Name , City AS LivesAt , NULL AS Gender

FROM Employee e JOIN Address a ON (e.Address = a.Id)

UNION

SELECT Name , NULL AS LivesAt , NULL AS Gender

FROM Employee e;

↑Employee

Name AddressGerd 2

Nandy NULL

↑Address

Id City Street1 Prag Krutz2 Aachen Pond

↑Employee

Name AddressGerd 2

Nandy NULL

Slide 6 of 27 Boris Glavic TRAMP

Page 19: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Data Provenance

Description

Which input tuples contributed to which output tuples?

ExamplePerson

Name LivesAt GenderGerd Aachen NULLGerd NULL NULL

Nandy NULL NULL

↑Employee

Name AddressGerd 2

Nandy NULL

↑Address

Id City Street1 Prag Krutz2 Aachen Pond

↑Employee

Name AddressGerd 2

Nandy NULL

Slide 6 of 27 Boris Glavic TRAMP

Page 20: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Which Mapping Created which Data?

ExamplePerson

Name LivesAt GenderGerd Aachen NULLGerd NULL NULL

Nandy NULL NULL

Mapping M1

Slide 7 of 27 Boris Glavic TRAMP

Page 21: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Which Mapping Created which Data?

ExamplePerson

Name LivesAt GenderGerd Aachen NULLGerd NULL NULL

Nandy NULL NULL

Mapping M2

Slide 7 of 27 Boris Glavic TRAMP

Page 22: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Mapping Provenance

Description

Which mappings generated a target tuple?

ExamplePerson

Name LivesAt GenderGerd Aachen NULLGerd NULL NULL

Nandy NULL NULL

Mapping M2

Slide 7 of 27 Boris Glavic TRAMP

Page 23: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Which Transformation Part Created which Data?

Employee Address

1

1

0

1Employee

1

01

ExamplePerson

Name LivesAt GenderGerd Aachen NULLGerd NULL NULL

Nandy NULL NULL

SELECT Name , city AS LivesAt ,

NULL AS Gender

FROM Employee e

JOIN Address a

ON (e.Address = a.Id)

UNION

SELECT Name , NULL AS LivesAt ,

NULL AS Gender

FROM Employee e;

Slide 8 of 27 Boris Glavic TRAMP

Page 24: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Which Transformation Part Created which Data?

Employee Address

0

0

1

0Employee

1

10

ExamplePerson

Name LivesAt GenderGerd Aachen NULLGerd NULL NULL

Nandy NULL NULL

SELECT Name , city AS LivesAt ,

NULL AS Gender

FROM Employee e

JOIN Address a

ON (e.Address = a.Id)

UNION

SELECT Name , NULL AS LivesAt ,

NULL AS Gender

FROM Employee e;

Slide 8 of 27 Boris Glavic TRAMP

Page 25: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Transformation Provenance

Description

Which parts of atransformationcontributed to anoutput tuple?

ExamplePerson

Name LivesAt GenderGerd Aachen NULLGerd NULL NULL

Nandy NULL NULL

SELECT Name , city AS LivesAt ,

NULL AS Gender

FROM Employee e

JOIN Address a

ON (e.Address = a.Id)

UNION

SELECT Name , NULL AS LivesAt ,

NULL AS Gender

FROM Employee e;

Slide 8 of 27 Boris Glavic TRAMP

Page 26: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Outline

1 Introduction

2 Approach

3 Transformation Provenance

4 Implementation

5 Results

6 Conclusions

Slide 9 of 27 Boris Glavic TRAMP

Page 27: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Overview

Glue everything together

Store all mapping scenario elements and theirinter-relationships in a database

Relational: instance data, schemas, transformations (views)XML: Mappings, Correspondences

Provenance SQL extension

On-demand provenance computationRelational/XML representationPROVENANCE: Data provenance (relational)TRANSXML: Transformation provenance (XML)

One fits all query language: SQL

SQL: data and data-provenanceXSLT: queries and transformation provenance

Slide 9 of 27 Boris Glavic TRAMP

Page 28: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Related Work

Understanding Schema Mappings

Approaches using provenance

SPIDER [Chiticariu, Tan, VLDB’06]MXQL [Velegrakis, Miller, Mylopoulos, ICDE’05]Orchestra [Karvounarakis, Ives, Tannen, SIGMOD’10]

Example based approaches

Clio Data Viewer [Yan, Miller, Haas, Fagin SIGMOD ’01]MUSE [Alexe, Chiticariu, Miller, Tan, ICDE ’08]

Slide 10 of 27 Boris Glavic TRAMP

Page 29: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Related Work

Understanding Schema Mappings

Approaches using provenance

SPIDER [Chiticariu, Tan, VLDB’06]MXQL [Velegrakis, Miller, Mylopoulos, ICDE’05]Orchestra [Karvounarakis, Ives, Tannen, SIGMOD’10]

Supports\System SPIDER MXQL OrchestraData Provenance X * X

Mapping Provenance X X X

Transformation Provenance - - -

Querying Provenance - X X

Querying Mappings - - -

Slide 10 of 27 Boris Glavic TRAMP

Page 30: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Contributions

Provenance

Data Provenance: Perm (ASPJ-Set queries with nestedsubqueries)

Mapping Provenance: In combination with transformations(ASPJ-Set)

Transformation Provenance: NEW (ASPJ-Set)

Querying

Single query language: Data + Provenance + MappingScenarios

Slide 11 of 27 Boris Glavic TRAMP

Page 31: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

System > Sum(components)

Advanced Examples

Which tuples where derived through mapping M1 or M2,without accessing source relation R, and are derived fromtuple x in Relation Y?

Are there tuples in the target relation R that have beenderived from the same input tuple?

Error Classification

Classification of error types in the paper

Foreach error type: example how to use TRAMP to debug

Slide 12 of 27 Boris Glavic TRAMP

Page 32: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Outline

1 Introduction

2 Approach

3 Transformation Provenance

4 Implementation

5 Results

6 Conclusions

Slide 13 of 27 Boris Glavic TRAMP

Page 33: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Modelling Transformation Provenance

Representation

Transformation provenance of tuple t from query q:

(Set of) annotated algebra trees for q

Each node carries boolean annotation

1-Annotation: operator contributed0-Annotation: operator did not contribute

Example

Employee Address

1

1

0

1Employee

1

01Employee Address

0

0

1

0Employee

1

10

Slide 13 of 27 Boris Glavic TRAMP

Page 34: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Modelling Transformation Provenance cont.

Definition

For tuple t from query q

Node for op carries 1-annotation iff:

Evaluating the subtree under op over the data provenance of tdoes not return the empty set

Intuition

Data provenance is necessary information to produce tNone of this information “reaches” the output of the subtreeunder op ⇒ This part of the query did not contribute

Relation to data provenance

AbstractionDifferent RepresentationData provenance of all query steps

Slide 14 of 27 Boris Glavic TRAMP

Page 35: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Modelling Transformation Provenance cont.

Definition

For tuple t from query q

Node for op carries 1-annotation iff:

Evaluating the subtree under op over the data provenance of tdoes not return the empty set

Intuition

Data provenance is necessary information to produce tNone of this information “reaches” the output of the subtreeunder op ⇒ This part of the query did not contribute

Relation to data provenance

AbstractionDifferent RepresentationData provenance of all query steps

Slide 14 of 27 Boris Glavic TRAMP

Page 36: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Modelling Transformation Provenance cont.

Definition

For tuple t from query q

Node for op carries 1-annotation iff:

Evaluating the subtree under op over the data provenance of tdoes not return the empty set

Intuition

Data provenance is necessary information to produce tNone of this information “reaches” the output of the subtreeunder op ⇒ This part of the query did not contribute

Relation to data provenance

AbstractionDifferent RepresentationData provenance of all query steps

Slide 14 of 27 Boris Glavic TRAMP

Page 37: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Transformation Provenance Example

Employee Address

1

1

11

ExamplePerson

Name LivesAt GenderGerd Aachen NULL

Nandy NULL NULL

↑SELECT Name , city AS LivesAt ,

NULL AS Gender

FROM Employee e

LEFT JOIN Address a

ON (e.Address = a.Id);

↑Employee

Name AddressGerd 2

Nandy NULL

↑Address

Id City Street1 Prag Krutz2 Aachen Pond

Slide 15 of 27 Boris Glavic TRAMP

Page 38: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Transformation Provenance Example

Employee Address

1

1

01

ExamplePerson

Name LivesAt GenderGerd Aachen NULL

Nandy NULL NULL

↑SELECT Name , city AS LivesAt ,

NULL AS Gender

FROM Employee e

LEFT JOIN Address a

ON (e.Address = a.Id);

↑Employee

Name AddressGerd 2

Nandy NULL

↑Address

Id City Street1 Prag Krutz2 Aachen Pond

Slide 15 of 27 Boris Glavic TRAMP

Page 39: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Relational Representation

Data + Provenance Relation

Query result schema + transformation provenance attribute

Duplicate result tuple t

Add one annotated tree to each duplicate

XML representation of annotated tree

Example

<Query >

<Select >

<Attr name="Name"><Var>e.Name</Var></Attr>

...

<From><LeftJoin >

...

<NOT><Relation alias="a">Address </Relation ></NOT>

...

Slide 16 of 27 Boris Glavic TRAMP

Page 40: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Outline

1 Introduction

2 Approach

3 Transformation Provenance

4 Implementation

5 Results

6 Conclusions

Slide 17 of 27 Boris Glavic TRAMP

Page 41: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

System Overview

Based On Perm

Modified PostgreSQL server

Provenance language constructs implemented as queryrewrites

“Use SQL to compute the provenance of SQL”

Optimizer

Query Plan

ExecutionEngine

Executor

Query Resultsa prov_a prov_b

123 'hello' 2.45

445 'test' 1.333

TRAMP Module

Rewritten Query Tree

TRAMPModule

Postgres Parser

Query Tree

Postgres Analyser

Parser & Analyzer

SELECT PROVENANCE *

FROM ...

JDBC

User

Slide 17 of 27 Boris Glavic TRAMP

Page 42: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Internal Bitset Representation

Annotated Algebra Trees only vary in annotation

⇒ Factor out the tree

⇒ Annotations = set of nodes with 1-annotation

⇒ Use a bit-vector

Compact representationSet union = bitwise-or

Slide 18 of 27 Boris Glavic TRAMP

Page 43: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Transformation Provenance Computation

Query Rewrite

Rewrite query q into qT

qT propagates bitsets throughout the query (partial annotatedtrees)

Result construction function fXML: builds XML

No data provenance needed!

Algebraic rewrite rules (correctness proven)

Optimizations

Depending on operators some annotations are static

static = are independent of t

Static sets are generated beforehand

Slide 19 of 27 Boris Glavic TRAMP

Page 44: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Rewrite Algorithm

1 Analyze query to

Enumerate the operators and attach singleton bitsetsDetermine static bit-sets

2 Apply rewrite rules

Recursively to each operator in the query

3 Add application of fXML

Slide 20 of 27 Boris Glavic TRAMP

Page 45: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Rewrite Algorithm

1 Analyze query to

Enumerate the operators and attach singleton bitsetsDetermine static bit-sets

2 Apply rewrite rules

Recursively to each operator in the query

3 Add application of fXML

Slide 20 of 27 Boris Glavic TRAMP

Page 46: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Rewrite Algorithm

1 Analyze query to

Enumerate the operators and attach singleton bitsetsDetermine static bit-sets

2 Apply rewrite rules

Recursively to each operator in the query

3 Add application of fXML

Slide 20 of 27 Boris Glavic TRAMP

Page 47: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Algorithm Step 1

SELECT

Name ,

City AS LivesAt ,

NULL AS Gender

FROM

Employee e

LEFT JOIN

Address a

ON (e.Address = a.Id);

Employee Address

1000

0100

0010 0001

For this query

Projection + Left Join + Employee is fixed⇒ use one bit-set: 1110

Slide 21 of 27 Boris Glavic TRAMP

Page 48: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Algorithm Step 1

SELECT

Name ,

City AS LivesAt ,

NULL AS Gender

FROM

Employee e

LEFT JOIN

Address a

ON (e.Address = a.Id);

Employee Address

1000

0100

0010 0001

For this query

Projection + Left Join + Employee is fixed⇒ use one bit-set: 1110

Slide 21 of 27 Boris Glavic TRAMP

Page 49: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Algorithm Step 2

SELECT Name , City AS LivesAt , NULL AS Gender

bitor(

1110,

CASE

WHEN a.Id IS NULL THEN 0000

ELSE 0001

END) AS trans prov

FROM Employee e

LEFT JOIN Address a

ON (e.Address = a.Id);

Slide 22 of 27 Boris Glavic TRAMP

Page 50: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Algorithm Step 2

SELECT Name , City AS LivesAt , NULL AS Gender

bitor(

1110,

CASE

WHEN a.Id IS NULL THEN 0000

ELSE 0001

END) AS trans prov

FROM Employee e

LEFT JOIN Address a

ON (e.Address = a.Id);

Slide 22 of 27 Boris Glavic TRAMP

Page 51: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Algorithm Step 2

SELECT Name , City AS LivesAt , NULL AS Gender

bitor(

1110,

CASE

WHEN a.Id IS NULL THEN 0000

ELSE 0001

END) AS trans prov

FROM Employee e

LEFT JOIN Address a

ON (e.Address = a.Id);

Slide 22 of 27 Boris Glavic TRAMP

Page 52: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Rewrite Result

Name LivesAt Gender trans provGerd Aachen NULL 1111

Nandy NULL NULL 1110

Slide 23 of 27 Boris Glavic TRAMP

Page 53: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Algorithm Step 3

SELECT Name , City AS LivesAt , NULL AS Gender

fXML(

bitor(

1110,

CASE

WHEN a.Id IS NULL THEN 0000

ELSE 0001

END)) AS trans_prov

FROM Employee e

LEFT JOIN Address a

ON (e.Address = a.Id);

Slide 24 of 27 Boris Glavic TRAMP

Page 54: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Rewrite Result

Name LivesAt Gender trans provGerd Aachen NULL <Query><Select>...<From>...

Nandy NULL NULL <Query><Select>...<From>...

Gerd

<Query >

<Select >

<Attr name="Name"><Var>e.Name</Var></Attr>

...

<From><LeftJoin >

...

<Relation alias="a">Address </Relation >

...

Slide 25 of 27 Boris Glavic TRAMP

Page 55: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Rewrite Result

Name LivesAt Gender trans provGerd Aachen NULL <Query><Select>...<From>...

Nandy NULL NULL <Query><Select>...<From>...

Nandy

<Query >

<Select >

<Attr name="Name"><Var>e.Name</Var></Attr>

...

<From><LeftJoin >

...

<NOT><Relation alias="a">Address </Relation ></NOT>

...

Slide 25 of 27 Boris Glavic TRAMP

Page 56: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Outline

1 Introduction

2 Approach

3 Transformation Provenance

4 Implementation

5 Results

6 Conclusions

Slide 26 of 27 Boris Glavic TRAMP

Page 57: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Experimental Results

Based on Amalgam benchmark (publication data)

Execution times for implementing transformations with andwithout transformation provenance

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

1 5 10 50 100 500 1000

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

Rela

tive O

verh

ead

Instance Size (#publs in thousands)

W/O provenanceOnly Bitset

Transformation Prov

Slide 26 of 27 Boris Glavic TRAMP

Page 58: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Outline

1 Introduction

2 Approach

3 Transformation Provenance

4 Implementation

5 Results

6 Conclusions

Slide 27 of 27 Boris Glavic TRAMP

Page 59: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Introduction Approach Transformation Provenance Implementation Results Conclusions

Conclusion

TRAMP = holistic approach for understanding mappings

Perm approach for data provenance

Transformation provenance

Querying of

DataProvenanceMapping scenario information

Future Work

Integrate with data exchange/integration systems

Combine with example-based approaches like MUSE

Provide hints on how to change mappings to generate anexpected result [Tran, Chan SIGMOD ’10]

Slide 27 of 27 Boris Glavic TRAMP

Page 60: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Questions

Questions

Perm/TRAMP Open Source Release

On Source Forge

http://permdbms.sourceforge.net/

TRAMP: Understanding the Behaviour of Schema Mappings through Provenance

Page 61: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Questions

Storing Mapping Scenarios

Storing Mapping Elements

<Correspondence >

<from>

<Table name="Employee">

<Column >name</Column >

</Table >

</from>

<to>

<Table name="Person">

<Column >name</Column >

</Table >

</to>

</Correspondence >

TRAMP: Understanding the Behaviour of Schema Mappings through Provenance

Page 62: 2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Provenance

Questions

Storing Mapping Scenarios

Mapping-Transformation Interrelationships

Annotate parts of implementing transformations withcorresponding mappings.

SELECT ANNOT (’M1’) Name , city AS LivesAt ,

NULL AS Gender

FROM Employee e

JOIN Address a

ON (e.Address = a.Id)

UNION

SELECT ANNOT (’M2’) Name , NULL AS LivesAt ,

NULL AS Gender

FROM Employee e;

TRAMP: Understanding the Behaviour of Schema Mappings through Provenance