Top Banner
MUSE: Mapping Understanding and deSign by Example Bogdan Alexe UC Santa Cruz Laura Chiticariu UC Santa Cruz Ren´ ee J. Miller U. of Toronto Wang-Chiew Tan UC Santa Cruz April 8, 2008
99

MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Jul 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE:Mapping Understandingand deSign by Example

Bogdan Alexe UC Santa Cruz

Laura Chiticariu UC Santa Cruz

Renee J. Miller U. of Toronto

Wang-Chiew Tan UC Santa Cruz

April 8, 2008

Page 2: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Schema Mappings

2 / 23

■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.

Page 3: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Schema Mappings

2 / 23

■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Page 4: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Schema Mappings

2 / 23

■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Visual specificationthrough

value correspondences

Page 5: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Schema Mappings

2 / 23

■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.

Source schema S Target schema T

Visual spec.

Page 6: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Schema Mappings

2 / 23

■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.

Source schema S Target schema T

Visual spec.

Mapping systemse.g.

IBM Clio

Altova MapForce

Stylus Studio

MS Biztalk Mapper

Declarative specification

Executable code(XSLT, XQuery, Java)

Page 7: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Schema Mappings

2 / 23

■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.

Source schema S Target schema T

Visual spec.

Mapping systemse.g.

IBM Clio

Altova MapForce

Stylus Studio

MS Biztalk Mapper

Declarative specification

Executable code(XSLT, XQuery, Java)

I

Example:Data Exchange

Page 8: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Schema Mappings

2 / 23

■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.

Source schema S Target schema T

Visual spec.

Mapping systemse.g.

IBM Clio

Altova MapForce

Stylus Studio

MS Biztalk Mapper

Declarative specification

Executable code(XSLT, XQuery, Java)

I

Example:Data ExchangeJ

Page 9: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Designing Mappings

3 / 23

■ Mapping systems can automate only part of the mapping process

◆ Typically, intricate manual work is needed to perfect thespecification

Page 10: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Designing Mappings

3 / 23

■ Mapping systems can automate only part of the mapping process

◆ Typically, intricate manual work is needed to perfect thespecification

■ The visual specification may be ambiguous. Mapping systems makedefault choices to resolve the ambiguities

◆ These choices may not correspond to a designer’s intentions

◆ The mapping designer might refine the specification manually

Page 11: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Real Life Example

4 / 23

In real life scenarios, mappings are extremely complicated

Page 12: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Real Life Example

4 / 23

In real life scenarios, mappings are extremely complicated

Page 13: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Designing Mappings

5 / 23

■ Specifications are often impossible to understand through visualinspection

■ Few tools are available to assist in understanding and designingalternative mappings

Page 14: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Designing Mappings

5 / 23

■ Specifications are often impossible to understand through visualinspection

■ Few tools are available to assist in understanding and designingalternative mappings

■ MUSE is a tool designed towards this end

Page 15: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Designing Mappings

5 / 23

■ Specifications are often impossible to understand through visualinspection

■ Few tools are available to assist in understanding and designingalternative mappings

■ MUSE is a tool designed towards this end

■ In MUSE, we focus on declarative specifications

Source schema S Target schema T

Visual spec.

Declarative specification

Executable code(XSLT, XQuery, Java)

Advantages:– easier to reason about– reusable for various tasks

Page 16: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Our vision

6 / 23

■ MUSE is a mapping design wizard that uses (real) data examples tohelp designers understand, design and refine schema mappings

■ MUSE leverages familiar data examples to help understand mappings

◆ real data examples are used whenever possible◆ otherwise, synthetic examples are constructed

Page 17: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Our vision

6 / 23

■ MUSE is a mapping design wizard that uses (real) data examples tohelp designers understand, design and refine schema mappings

■ MUSE leverages familiar data examples to help understand mappings

◆ real data examples are used whenever possible◆ otherwise, synthetic examples are constructed

■ Currently, MUSE has two features

◆ Muse-G: design grouping semantics◆ Muse-D: disambiguate alternative mappings

Page 18: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

Page 19: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Page 20: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Page 21: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Page 22: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

Page 23: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

EssentiallyYes/No Answers

Page 24: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

EssentiallyYes/No Answers

RefinementGrouping Semantics

Disambiguation

Page 25: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

EssentiallyYes/No Answers

RefinementGrouping Semantics

Disambiguation

Page 26: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

EssentiallyYes/No Answers

RefinementGrouping Semantics

Disambiguation

Page 27: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

EssentiallyYes/No Answers

RefinementGrouping Semantics

Disambiguation

Page 28: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

EssentiallyYes/No Answers

RefinementGrouping Semantics

Disambiguation

Page 29: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

EssentiallyYes/No Answers

RefinementGrouping Semantics

Disambiguation

Page 30: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

EssentiallyYes/No Answers

RefinementGrouping Semantics

Disambiguation

Page 31: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

MUSE Workflow

7 / 23

MUSE

MappingSpecification

Real SourceInstance

(if available)

Real/SyntheticData

Examples

Generation

Mapping designerinspects

data examples

Examination

EssentiallyYes/No Answers

RefinementGrouping Semantics

Disambiguation

Page 32: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

Page 33: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Page 34: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Page 35: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Declarative Mapping

forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees

Page 36: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Declarative Mapping

forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees

satisfyp.cbranch = c.cbranche.eid = p.manager

Page 37: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Declarative Mapping

forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees

satisfyp.cbranch = c.cbranche.eid = p.manager

existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees

Page 38: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Declarative Mapping

forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees

satisfyp.cbranch = c.cbranche.eid = p.manager

existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees

satisfyp1.manager = e1.eid

Page 39: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Declarative Mapping

forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees

satisfyp.cbranch = c.cbranche.eid = p.manager

existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees

satisfyp1.manager = e1.eid

wherec.cname = o.onamee.eid = e1.eide.ename = e1.enamep.pname = p1.pname

Page 40: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Grouping Projects:Example source:

CompaniesRedmond Microsoft USAS. Valley Microsoft USAProjects

P1 DB Redmond e4P2 Web S. Valley e5

Page 41: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Grouping Projects:Example source:

CompaniesRedmond Microsoft USAS. Valley Microsoft USAProjects

P1 DB Redmond e4P2 Web S. Valley e5

Group by cbranchOrgs

MicrosoftProjects:

DB e4MicrosoftProjects:

Web e5

Group by cnameOrgs

MicrosoftProjects:

DB e4Web e5

Page 42: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Declarative Mapping

forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees

satisfyp.cbranch = c.cbranche.eid = p.manager

existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees

satisfyp1.manager = e1.eid

wherec.cname = o.onamee.eid = e1.eide.ename = e1.enamep.pname = p1.pname

GroupingFunction

o.Projects = SKProjects(c.cbranch, c.cname, c.location)

Page 43: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Example

8 / 23

CompDB: Rcd

Companies: Set of

Company: Rcdcbranchcnamelocation

Projects: Set of

Project: Rcdpidpnamecbranchmanager

Employees: Set of

Employee: Rcdeidenamecontact

OrgDB: Rcd

Orgs: Set of

Org: RcdonameProjects: Set of

Project: Rcdpnamemanager

Employees: Set of

Employee: Rcdeidename

Declarative Mapping

forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees

satisfyp.cbranch = c.cbranche.eid = p.manager

existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees

satisfyp1.manager = e1.eid

wherec.cname = o.onamee.eid = e1.eide.ename = e1.enamep.pname = p1.pname

GroupingFunction

o.Projects = SKProjects(c.cbranch, c.cname, c.location)

Group by what subset of {cbranch, cname, location} ?

Page 44: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Grouping Semantics Design

9 / 23

■ Goal: infer a grouping function that has the same effect as the oneintended by the designer

Page 45: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Grouping Semantics Design

9 / 23

■ Goal: infer a grouping function that has the same effect as the oneintended by the designer

■ Muse-G probes each possible grouping attribute: start with cbranch

Page 46: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Grouping Semantics Design

9 / 23

■ Goal: infer a grouping function that has the same effect as the oneintended by the designer

■ Muse-G probes each possible grouping attribute: start with cbranch

Example source

CompaniesRedmond Microsoft USAS. Valley Microsoft USA

Projects

P1 DB Redmond e4P2 Web S. Valley e5

Employeese4 John x234e5 Anna x888

Page 47: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Grouping Semantics Design

9 / 23

■ Goal: infer a grouping function that has the same effect as the oneintended by the designer

■ Muse-G probes each possible grouping attribute: start with cbranch

Example source

CompaniesRedmond Microsoft USAS. Valley Microsoft USA

Projects

P1 DB Redmond e4P2 Web S. Valley e5

Employeese4 John x234e5 Anna x888

Target Scenario 1group

by cbranchOrgs

MicrosoftProjects:

DB e4

MicrosoftProjects:

Web e5

Employeese4 Johne5 Anna

Target Scenario 2do not groupby cbranch

OrgsMicrosoftProjects:

DB e4Web e5

Employeese4 Johne5 Anna

Page 48: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Grouping Semantics Design

9 / 23

■ Goal: infer a grouping function that has the same effect as the oneintended by the designer

■ Muse-G probes each possible grouping attribute: start with cbranch

Example source

CompaniesRedmond Microsoft USAS. Valley Microsoft USA

Projects

P1 DB Redmond e4P2 Web S. Valley e5

Employeese4 John x234e5 Anna x888

Target Scenario 1group

by cbranchOrgs

MicrosoftProjects:

DB e4

MicrosoftProjects:

Web e5

Employeese4 Johne5 Anna

Target Scenario 2do not groupby cbranch

OrgsMicrosoftProjects:

DB e4Web e5

Employeese4 Johne5 Anna

Page 49: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Grouping Semantics Design

9 / 23

■ Goal: infer a grouping function that has the same effect as the oneintended by the designer

■ Muse-G probes each possible grouping attribute: start with cbranch

Example source

CompaniesRedmond Microsoft USAS. Valley Microsoft USA

Projects

P1 DB Redmond e4P2 Web S. Valley e5

Employeese4 John x234e5 Anna x888

Target Scenario 1group

by cbranchOrgs

MicrosoftProjects:

DB e4

MicrosoftProjects:

Web e5

Employeese4 Johne5 Anna

Target Scenario 2do not groupby cbranch

OrgsMicrosoftProjects:

DB e4Web e5

Employeese4 Johne5 Anna

Page 50: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Grouping Semantics Design

9 / 23

■ Goal: infer a grouping function that has the same effect as the oneintended by the designer

■ Muse-G probes each possible grouping attribute: start with cbranch

Example source

CompaniesRedmond Microsoft USAS. Valley Microsoft USA

Projects

P1 DB Redmond e4P2 Web S. Valley e5

Employeese4 John x234e5 Anna x888

Target Scenario 1group

by cbranchOrgs

MicrosoftProjects: SK(Redmond,y)

DB e4

MicrosoftProjects: SK(S. Valley,y)

Web e5

Employeese4 Johne5 Anna

Target Scenario 2do not groupby cbranch

OrgsMicrosoftProjects: SK(y)

DB e4Web e5

Employeese4 Johne5 Anna

y ⊆ { Microsoft, USA }

Page 51: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Second Question

10 / 23

■ The next probed attribute is cname

Example source

CompaniesS. Valley Microsoft USAMt. View Google USA

Projects

P1 DB S. Valley e4P4 Web Mt. View e6

Employeese4 John x234e6 Kat x331

Page 52: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Second Question

10 / 23

■ The next probed attribute is cname

Example source

CompaniesS. Valley Microsoft USAMt. View Google USA

Projects

P1 DB S. Valley e4P4 Web Mt. View e6

Employeese4 John x234e6 Kat x331

Target Scenario 1group

by cnameOrgs

MicrosoftProjects:

DB e4GoogleProjects:

Web e6

Employeese4 Johne6 Kat

Target Scenario 2do not groupby cname

OrgsMicrosoftProjects:

DB e4Web e6

GoogleProjects:

DB e4Web e6

Employeese4 Johne6 Kat

Page 53: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Second Question

10 / 23

■ The next probed attribute is cname

Example source

CompaniesS. Valley Microsoft USAMt. View Google USA

Projects

P1 DB S. Valley e4P4 Web Mt. View e6

Employeese4 John x234e6 Kat x331

Target Scenario 1group

by cnameOrgs

MicrosoftProjects:

DB e4GoogleProjects:

Web e6

Employeese4 Johne6 Kat

Target Scenario 2do not groupby cname

OrgsMicrosoftProjects:

DB e4Web e6

GoogleProjects:

DB e4Web e6

Employeese4 Johne6 Kat

Page 54: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Second Question

10 / 23

■ The next probed attribute is cname

Example source

CompaniesS. Valley Microsoft USAMt. View Google USA

Projects

P1 DB S. Valley e4P4 Web Mt. View e6

Employeese4 John x234e6 Kat x331

Target Scenario 1group

by cnameOrgs

MicrosoftProjects:

DB e4GoogleProjects:

Web e6

Employeese4 Johne6 Kat

Target Scenario 2do not groupby cname

OrgsMicrosoftProjects:

DB e4Web e6

GoogleProjects:

DB e4Web e6

Employeese4 Johne6 Kat

Page 55: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Second Question

10 / 23

■ The next probed attribute is cname

Example source

CompaniesS. Valley Microsoft USAMt. View Google USA

Projects

P1 DB S. Valley e4P4 Web Mt. View e6

Employeese4 John x234e6 Kat x331

Target Scenario 1group

by cnameOrgs

MicrosoftProjects: SK(Microsoft,y)

DB e4GoogleProjects: SK(Google,y)

Web e6

Employeese4 Johne6 Kat

Target Scenario 2do not groupby cname

OrgsMicrosoftProjects: SK(y)

DB e4Web e6

GoogleProjects: SK(y)

DB e4Web e6

Employeese4 Johne6 Kat

y ⊆ { USA }

Page 56: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Second Question

10 / 23

■ The next probed attribute is cname

Example source

CompaniesS. Valley Microsoft USAMt. View Google USA

Projects

P1 DB S. Valley e4P4 Web Mt. View e6

Employeese4 John x234e6 Kat x331

Target Scenario 1group

by cnameOrgs

MicrosoftProjects: SK(Microsoft,y)

DB e4GoogleProjects: SK(Google,y)

Web e6

Employeese4 Johne6 Kat

Target Scenario 2do not groupby cname

OrgsMicrosoftProjects: SK(y)

DB e4Web e6

GoogleProjects: SK(y)

DB e4Web e6

Employeese4 Johne6 Kat

y ⊆ { USA }

■ The wizard continues to probe the remaining possible grouping attributes

Page 57: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

11 / 23

Running queries over the real source instance I

Example: probing on cname

Page 58: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

11 / 23

Running queries over the real source instance I

Example: probing on cname

Query:

Companies(c1,n1,l1) ∧

Projects(p1,pn1,c1,e1) ∧

Employees(e1,en1,cn1) ∧

Page 59: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

11 / 23

Running queries over the real source instance I

Example: probing on cname

Query:

Companies(c1,n1,l1) ∧

Projects(p1,pn1,c1,e1) ∧

Employees(e1,en1,cn1) ∧

Companies(c2,n2,l1) ∧

Projects(p2,pn2,c2,e2) ∧

Employees(e2,en2,cn2) ∧

Page 60: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

11 / 23

Running queries over the real source instance I

Example: probing on cname

Query:

Companies(c1,n1,l1) ∧

Projects(p1,pn1,c1,e1) ∧

Employees(e1,en1,cn1) ∧

Companies(c2,n2,l1) ∧

Projects(p2,pn2,c2,e2) ∧

Employees(e2,en2,cn2) ∧

n1 6= n2

Page 61: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

11 / 23

Running queries over the real source instance I

Example: probing on cname

Query:

Companies(c1,n1,l1) ∧

Projects(p1,pn1,c1,e1) ∧

Employees(e1,en1,cn1) ∧

Companies(c2,n2,l1) ∧

Projects(p2,pn2,c2,e2) ∧

Employees(e2,en2,cn2) ∧

n1 6= n2

Non-emptyresult

Real Example:

CompaniesS. Valley Microsoft USAMt. View Google USA

ProjectsP1 DB S. Valley e4P4 Web Mt. View e6

Employeese4 John x234e6 Kat x331

Page 62: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

11 / 23

Running queries over the real source instance I

Example: probing on cname

Query:

Companies(c1,n1,l1) ∧

Projects(p1,pn1,c1,e1) ∧

Employees(e1,en1,cn1) ∧

Companies(c2,n2,l1) ∧

Projects(p2,pn2,c2,e2) ∧

Employees(e2,en2,cn2) ∧

n1 6= n2

Non-emptyresult

Real Example:

CompaniesS. Valley Microsoft USAMt. View Google USA

ProjectsP1 DB S. Valley e4P4 Web Mt. View e6

Employeese4 John x234e6 Kat x331

Emptyresult

Synthetic Example:

Companiesc1 n1 l1c2 n2 l1

Projectsp1 pn1 c1 e1

p2 pn2 c2 e2

Employeese1 en1 cn1

e2 en2 cn2

Page 63: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G with FDs

12 / 23

■ Considering functional dependencies in the source can reduce the numberof questions posed to the designer

Page 64: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G with FDs

12 / 23

■ Considering functional dependencies in the source can reduce the numberof questions posed to the designer

■ Two mappings M1, M2 have the same effect if for any source instance I,the result of exchanging I with M1 is the “same” as the result ofexchanging I with M2

Homomorphically equivalent

Page 65: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G with FDs

12 / 23

■ Considering functional dependencies in the source can reduce the numberof questions posed to the designer

■ Two mappings M1, M2 have the same effect if for any source instance I,the result of exchanging I with M1 is the “same” as the result ofexchanging I with M2

Homomorphically equivalent

Proposition. If a FD A → B holds, then a mapping M that groups by A has

the same effect as a mapping M that groups by A ∪ C, where C ⊆ B.

Page 66: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G with FDs

12 / 23

■ Considering functional dependencies in the source can reduce the numberof questions posed to the designer

■ Two mappings M1, M2 have the same effect if for any source instance I,the result of exchanging I with M1 is the “same” as the result ofexchanging I with M2

Homomorphically equivalent

Proposition. If a FD A → B holds, then a mapping M that groups by A has

the same effect as a mapping M that groups by A ∪ C, where C ⊆ B.

■ Suppose cbranch is a key, then we may save some questions

◆ If the designer chooses Scenario 1 (including cbranch in the groupingfunction), probing on cname or location is no longer necessary

Page 67: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Properties

13 / 23

Proposition (Completeness). If there are n possible grouping attributes for a

nested set S, then the questions asked by Muse-G explore the entire space of

2n grouping functions. Muse-G asks at most n questions to infer the desired

grouping semantics for S.

Page 68: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Properties

13 / 23

Proposition (Completeness). If there are n possible grouping attributes for a

nested set S, then the questions asked by Muse-G explore the entire space of

2n grouping functions. Muse-G asks at most n questions to infer the desired

grouping semantics for S.

Proposition (Small examples). At each probe, Muse-G constructs a source

example of size at most twice the number of conjuncts in the for clause of the

mapping.

Page 69: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Properties

13 / 23

Proposition (Completeness). If there are n possible grouping attributes for a

nested set S, then the questions asked by Muse-G explore the entire space of

2n grouping functions. Muse-G asks at most n questions to infer the desired

grouping semantics for S.

Proposition (Small examples). At each probe, Muse-G constructs a source

example of size at most twice the number of conjuncts in the for clause of the

mapping.

■ Incremental design: group more or less starting from an existing groupingfunction

Page 70: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-G: Properties

13 / 23

Proposition (Completeness). If there are n possible grouping attributes for a

nested set S, then the questions asked by Muse-G explore the entire space of

2n grouping functions. Muse-G asks at most n questions to infer the desired

grouping semantics for S.

Proposition (Small examples). At each probe, Muse-G constructs a source

example of size at most twice the number of conjuncts in the for clause of the

mapping.

■ Incremental design: group more or less starting from an existing groupingfunction

■ Design for a specific source instance: reduce the number of questions

◆ Muse-G identifies attributes whose inclusion/exclusion as argumentsof grouping functions is inconsequential

Page 71: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Ambiguous Mappings

14 / 23

CompDB: Rcd

Projects: Set of

Project: Rcd

pidpnamemanagertech-lead

Employees: Set of

Employee: Rcd

eidenamecontact

Page 72: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Ambiguous Mappings

14 / 23

CompDB: Rcd

Projects: Set of

Project: Rcd

pidpnamemanagertech-lead

Employees: Set of

Employee: Rcd

eidenamecontact

OrgDB: Rcd

Projects: Set of

Project: Rcdpnamesupervisoremail

Page 73: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Ambiguous Mappings

14 / 23

CompDB: Rcd

Projects: Set of

Project: Rcd

pidpnamemanagertech-lead

Employees: Set of

Employee: Rcd

eidenamecontact

OrgDB: Rcd

Projects: Set of

Project: Rcdpnamesupervisoremail

Page 74: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Ambiguous Mappings

14 / 23

CompDB: Rcd

Projects: Set of

Project: Rcd

pidpnamemanagertech-lead

Employees: Set of

Employee: Rcd

eidenamecontact

OrgDB: Rcd

Projects: Set of

Project: Rcdpnamesupervisoremail

forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees

satisfye1.eid = p.managere2.eid = p.tech-lead

Page 75: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Ambiguous Mappings

14 / 23

CompDB: Rcd

Projects: Set of

Project: Rcd

pidpnamemanagertech-lead

Employees: Set of

Employee: Rcd

eidenamecontact

OrgDB: Rcd

Projects: Set of

Project: Rcdpnamesupervisoremail

forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees

satisfye1.eid = p.managere2.eid = p.tech-lead

exists

p1 in OrgDB.Projects

Page 76: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Ambiguous Mappings

14 / 23

CompDB: Rcd

Projects: Set of

Project: Rcd

pidpnamemanagertech-lead

Employees: Set of

Employee: Rcd

eidenamecontact

OrgDB: Rcd

Projects: Set of

Project: Rcdpnamesupervisoremail

forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees

satisfye1.eid = p.managere2.eid = p.tech-lead

exists

p1 in OrgDB.Projectswhere

p.pname = p1.pname

Page 77: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Ambiguous Mappings

14 / 23

CompDB: Rcd

Projects: Set of

Project: Rcd

pidpnamemanagertech-lead

Employees: Set of

Employee: Rcd

eidenamecontact

OrgDB: Rcd

Projects: Set of

Project: Rcdpnamesupervisoremail

forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees

satisfye1.eid = p.managere2.eid = p.tech-lead

exists

p1 in OrgDB.Projectswhere

p.pname = p1.pnamep1.supervisor =

e1.ename or e2.ename

Page 78: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Ambiguous Mappings

14 / 23

CompDB: Rcd

Projects: Set of

Project: Rcd

pidpnamemanagertech-lead

Employees: Set of

Employee: Rcd

eidenamecontact

OrgDB: Rcd

Projects: Set of

Project: Rcdpnamesupervisoremail

forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees

satisfye1.eid = p.managere2.eid = p.tech-lead

exists

p1 in OrgDB.Projectswhere

p.pname = p1.pnamep1.supervisor =

e1.ename or e2.ename

p1.email =e1.contact or e2.contact

AmbiguousElements

■ This mapping is ambiguous■ There are four alternative interpretations

e1.ename e1.ename e2.ename e2.enamee1.contact e2.contact e1.contact e2.contact

Page 79: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Disambiguating Mappings

15 / 23

■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way

Page 80: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Disambiguating Mappings

15 / 23

■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way

Projects

P1 DB e4 e5Employees

e4 John john@ibme5 Anna anna@ibm

Page 81: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Disambiguating Mappings

15 / 23

■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way

Projects

P1 DB e4 e5Employees

e4 John john@ibme5 Anna anna@ibm

OrgsProjects:

DB John john@ibm

Anna anna@ibm

Page 82: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Disambiguating Mappings

15 / 23

■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way

Projects

P1 DB e4 e5Employees

e4 John john@ibme5 Anna anna@ibm

OrgsProjects:

DB John john@ibm

Anna anna@ibm

Designer makestwo choices

■ The mapping designer makes one choice for each ambiguous element■ Each decision removes one ambiguity

Page 83: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Disambiguating Mappings

15 / 23

■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way

Projects

P1 DB e4 e5Employees

e4 John john@ibme5 Anna anna@ibm

OrgsProjects:

DB John john@ibm

Anna anna@ibm

Designer makestwo choices

■ The mapping designer makes one choice for each ambiguous element■ Each decision removes one ambiguity

◆ E.g., choosing “Anna” as the supervisor and “john@ibm” as the email

p1.supervisor =e1.ename or e2.ename

p1.email =e1.contact or e2.contact

Page 84: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Disambiguating Mappings

15 / 23

■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way

Projects

P1 DB e4 e5Employees

e4 John john@ibme5 Anna anna@ibm

OrgsProjects:

DB John john@ibm

Anna anna@ibm

Designer makestwo choices

■ The mapping designer makes one choice for each ambiguous element■ Each decision removes one ambiguity

◆ E.g., choosing “Anna” as the supervisor and “john@ibm” as the email

p1.supervisor =e1.ename or e2.ename

p1.email =e1.contact or e2.contact

Page 85: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Disambiguating Mappings

15 / 23

■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way

Projects

P1 DB e4 e5Employees

e4 John john@ibme5 Anna anna@ibm

OrgsProjects:

DB John john@ibm

Anna anna@ibm

Designer makestwo choices

■ The mapping designer makes one choice for each ambiguous element■ Each decision removes one ambiguity

◆ E.g., choosing “Anna” as the supervisor and “john@ibm” as the email

p1.supervisor =e1.ename or e2.ename

p1.email =e1.contact or e2.contact

Page 86: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

16 / 23

Running queries over the real source instance

Page 87: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

16 / 23

Running queries over the real source instance

Query:Projects(p1,pn1,e1,e2) ∧

Employees(e1,en1,cn1) ∧

Employees(e2,en2,cn2) ∧

Page 88: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

16 / 23

Running queries over the real source instance

Query:Projects(p1,pn1,e1,e2) ∧

Employees(e1,en1,cn1) ∧

Employees(e2,en2,cn2) ∧

en1 6= en2 ∧ cn1 6= cn2

Page 89: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

16 / 23

Running queries over the real source instance

Query:Projects(p1,pn1,e1,e2) ∧

Employees(e1,en1,cn1) ∧

Employees(e2,en2,cn2) ∧

en1 6= en2 ∧ cn1 6= cn2

Real Example:Projects

P1 DB e4 e5Employees

e4 John john@ibme5 Anna anna@ibm

Non-emptyresult

Page 90: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Obtaining Source Examples

16 / 23

Running queries over the real source instance

Query:Projects(p1,pn1,e1,e2) ∧

Employees(e1,en1,cn1) ∧

Employees(e2,en2,cn2) ∧

en1 6= en2 ∧ cn1 6= cn2

Real Example:Projects

P1 DB e4 e5Employees

e4 John john@ibme5 Anna anna@ibm

Non-emptyresult

Synthetic Example:Projects

p1 pn1 e1 e2

Employeese1 en1 cn1

e2 en2 cn2

Emptyresult

Page 91: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Properties

17 / 23

■ For each ambiguous mapping, the designer is presented with a singleexample

Page 92: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Properties

17 / 23

■ For each ambiguous mapping, the designer is presented with a singleexample

Proposition (Completeness). The single example differentiates among all the

alternative interpretations of the ambiguous mapping. The mapping designer

has to make a number of choices equal to the number of ambiguous elements.

Page 93: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Muse-D: Properties

17 / 23

■ For each ambiguous mapping, the designer is presented with a singleexample

Proposition (Completeness). The single example differentiates among all the

alternative interpretations of the ambiguous mapping. The mapping designer

has to make a number of choices equal to the number of ambiguous elements.

Proposition (Small examples). The number of tuples in the example source

instance is the number of conjuncts in the for clause of the mapping.

Page 94: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Experiments: Setting

18 / 23

Mapping Size of Sets with Number Ambiguous AlternativeScenarios real source refinable of mappings interpretations

instance grouping mappings

Mondial 1MB 8 26 7 208

DBLP 2.6MB 6 4 0 4

TPCH 10MB 4 5 1 20

Amalgam 2MB 2 14 0 14

■ Mapping system: Clio■ Query evaluation: DB2 v9, Saxon-B 8.9■ MUSE implementation: Java 6

Page 95: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Experiments: Muse-G

19 / 23

Mapping Average # Number of % times Average time Groupingscenario of grouping questions found real to get real strategy

attributes (average) example example (s)

2.6 38% 0.014 G1

Mondial 13.1 8.5 41% 0.187 G2

2.9 40% 0.015 G3

1.5 17% 0.450 G1

DBLP 11 11 11% 0.337 G2

1.5 17% 0.454 G3

1.5 0% 0.785 G1

TPCH 26.7 17 12% 0.893 G2

1.5 0% 0.782 G3

2 29% 0.013 G1

Amalgam 14.1 3 52% 0.043 G2

3 52% 0.030 G3

G1: group by all possible attributes (cbranch,cname,location,pid,. . . )G2: group by all attributes exported above the set (cname)G3: group by all exported attributes (cname, pname, eid, ename)

Page 96: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Experiments: Muse-D

20 / 23

Mapping Alternatives Number of Size of Number ofScenario encoded questions source example ambiguous values

(# of tuples) in target instance

Mondial 208 7 3–4 4–5

TPCH 16 1 9 4

Page 97: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Related Work

21 / 23

■ There are many related works, we focus here on the closest one■ We were inspired by DataViewer [Yan et al. 01]

Feature \ System MUSE DataViewer

Examples Compact As many asto be analyzed representation alternative interpretations

(e.g. 7) (e.g. 208)Grouping Semantics Yes No

Data Models Relational and XML RelationalMapping Source-target SQLLanguage mappings (GLAV)

Page 98: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

Conclusion

22 / 23

■ MUSE: a mapping design wizard

◆ Use examples to understand, design, refine schema mappings

◆ Work with complete and small data examples rather thancomplex specifications

◆ Focus on two important aspects of a mapping specification

■ Grouping semantics

■ Interpretation of ambiguous mappings

Page 99: MUSE: Mapping Understanding and deSign by Examplealumni.soe.ucsc.edu/~abogdan/talks/muse-icde08-talk.pdf · 2008-04-06 · Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd

23 / 23

Thank you