MUSE: Mapping Understanding and deSign by Example Bogdan Alexe UC Santa Cruz Laura Chiticariu UC Santa Cruz Ren´ ee J. Miller U. of Toronto Wang-Chiew Tan UC Santa Cruz April 8, 2008
MUSE:Mapping Understandingand deSign by Example
Bogdan Alexe UC Santa Cruz
Laura Chiticariu UC Santa Cruz
Renee J. Miller U. of Toronto
Wang-Chiew Tan UC Santa Cruz
April 8, 2008
Schema Mappings
2 / 23
■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.
Schema Mappings
2 / 23
■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Schema Mappings
2 / 23
■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Visual specificationthrough
value correspondences
Schema Mappings
2 / 23
■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.
Source schema S Target schema T
Visual spec.
Schema Mappings
2 / 23
■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.
Source schema S Target schema T
Visual spec.
Mapping systemse.g.
IBM Clio
Altova MapForce
Stylus Studio
MS Biztalk Mapper
Declarative specification
Executable code(XSLT, XQuery, Java)
Schema Mappings
2 / 23
■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.
Source schema S Target schema T
Visual spec.
Mapping systemse.g.
IBM Clio
Altova MapForce
Stylus Studio
MS Biztalk Mapper
Declarative specification
Executable code(XSLT, XQuery, Java)
I
Example:Data Exchange
Schema Mappings
2 / 23
■ One of the first steps in information integration is to specify therelationships (schema mappings) between schemas. This is known to bea difficult task.
Source schema S Target schema T
Visual spec.
Mapping systemse.g.
IBM Clio
Altova MapForce
Stylus Studio
MS Biztalk Mapper
Declarative specification
Executable code(XSLT, XQuery, Java)
I
Example:Data ExchangeJ
Designing Mappings
3 / 23
■ Mapping systems can automate only part of the mapping process
◆ Typically, intricate manual work is needed to perfect thespecification
Designing Mappings
3 / 23
■ Mapping systems can automate only part of the mapping process
◆ Typically, intricate manual work is needed to perfect thespecification
■ The visual specification may be ambiguous. Mapping systems makedefault choices to resolve the ambiguities
◆ These choices may not correspond to a designer’s intentions
◆ The mapping designer might refine the specification manually
Real Life Example
4 / 23
In real life scenarios, mappings are extremely complicated
Real Life Example
4 / 23
In real life scenarios, mappings are extremely complicated
Designing Mappings
5 / 23
■ Specifications are often impossible to understand through visualinspection
■ Few tools are available to assist in understanding and designingalternative mappings
Designing Mappings
5 / 23
■ Specifications are often impossible to understand through visualinspection
■ Few tools are available to assist in understanding and designingalternative mappings
■ MUSE is a tool designed towards this end
Designing Mappings
5 / 23
■ Specifications are often impossible to understand through visualinspection
■ Few tools are available to assist in understanding and designingalternative mappings
■ MUSE is a tool designed towards this end
■ In MUSE, we focus on declarative specifications
Source schema S Target schema T
Visual spec.
Declarative specification
Executable code(XSLT, XQuery, Java)
Advantages:– easier to reason about– reusable for various tasks
Our vision
6 / 23
■ MUSE is a mapping design wizard that uses (real) data examples tohelp designers understand, design and refine schema mappings
■ MUSE leverages familiar data examples to help understand mappings
◆ real data examples are used whenever possible◆ otherwise, synthetic examples are constructed
Our vision
6 / 23
■ MUSE is a mapping design wizard that uses (real) data examples tohelp designers understand, design and refine schema mappings
■ MUSE leverages familiar data examples to help understand mappings
◆ real data examples are used whenever possible◆ otherwise, synthetic examples are constructed
■ Currently, MUSE has two features
◆ Muse-G: design grouping semantics◆ Muse-D: disambiguate alternative mappings
MUSE Workflow
7 / 23
MUSE
MUSE Workflow
7 / 23
MUSE
MappingSpecification
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
EssentiallyYes/No Answers
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
EssentiallyYes/No Answers
RefinementGrouping Semantics
Disambiguation
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
EssentiallyYes/No Answers
RefinementGrouping Semantics
Disambiguation
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
EssentiallyYes/No Answers
RefinementGrouping Semantics
Disambiguation
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
EssentiallyYes/No Answers
RefinementGrouping Semantics
Disambiguation
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
EssentiallyYes/No Answers
RefinementGrouping Semantics
Disambiguation
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
EssentiallyYes/No Answers
RefinementGrouping Semantics
Disambiguation
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
EssentiallyYes/No Answers
RefinementGrouping Semantics
Disambiguation
MUSE Workflow
7 / 23
MUSE
MappingSpecification
Real SourceInstance
(if available)
Real/SyntheticData
Examples
Generation
Mapping designerinspects
data examples
Examination
EssentiallyYes/No Answers
RefinementGrouping Semantics
Disambiguation
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Declarative Mapping
forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Declarative Mapping
forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees
satisfyp.cbranch = c.cbranche.eid = p.manager
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Declarative Mapping
forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees
satisfyp.cbranch = c.cbranche.eid = p.manager
existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Declarative Mapping
forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees
satisfyp.cbranch = c.cbranche.eid = p.manager
existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees
satisfyp1.manager = e1.eid
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Declarative Mapping
forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees
satisfyp.cbranch = c.cbranche.eid = p.manager
existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees
satisfyp1.manager = e1.eid
wherec.cname = o.onamee.eid = e1.eide.ename = e1.enamep.pname = p1.pname
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Grouping Projects:Example source:
CompaniesRedmond Microsoft USAS. Valley Microsoft USAProjects
P1 DB Redmond e4P2 Web S. Valley e5
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Grouping Projects:Example source:
CompaniesRedmond Microsoft USAS. Valley Microsoft USAProjects
P1 DB Redmond e4P2 Web S. Valley e5
Group by cbranchOrgs
MicrosoftProjects:
DB e4MicrosoftProjects:
Web e5
Group by cnameOrgs
MicrosoftProjects:
DB e4Web e5
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Declarative Mapping
forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees
satisfyp.cbranch = c.cbranche.eid = p.manager
existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees
satisfyp1.manager = e1.eid
wherec.cname = o.onamee.eid = e1.eide.ename = e1.enamep.pname = p1.pname
GroupingFunction
o.Projects = SKProjects(c.cbranch, c.cname, c.location)
Example
8 / 23
CompDB: Rcd
Companies: Set of
Company: Rcdcbranchcnamelocation
Projects: Set of
Project: Rcdpidpnamecbranchmanager
Employees: Set of
Employee: Rcdeidenamecontact
OrgDB: Rcd
Orgs: Set of
Org: RcdonameProjects: Set of
Project: Rcdpnamemanager
Employees: Set of
Employee: Rcdeidename
Declarative Mapping
forc in CompDB.Companiesp in CompDB.Projectse in CompDB.Employees
satisfyp.cbranch = c.cbranche.eid = p.manager
existso in OrgDB.Orgsp1 in o.Projectse1 in OrgDB.Employees
satisfyp1.manager = e1.eid
wherec.cname = o.onamee.eid = e1.eide.ename = e1.enamep.pname = p1.pname
GroupingFunction
o.Projects = SKProjects(c.cbranch, c.cname, c.location)
Group by what subset of {cbranch, cname, location} ?
Muse-G: Grouping Semantics Design
9 / 23
■ Goal: infer a grouping function that has the same effect as the oneintended by the designer
Muse-G: Grouping Semantics Design
9 / 23
■ Goal: infer a grouping function that has the same effect as the oneintended by the designer
■ Muse-G probes each possible grouping attribute: start with cbranch
Muse-G: Grouping Semantics Design
9 / 23
■ Goal: infer a grouping function that has the same effect as the oneintended by the designer
■ Muse-G probes each possible grouping attribute: start with cbranch
Example source
CompaniesRedmond Microsoft USAS. Valley Microsoft USA
Projects
P1 DB Redmond e4P2 Web S. Valley e5
Employeese4 John x234e5 Anna x888
Muse-G: Grouping Semantics Design
9 / 23
■ Goal: infer a grouping function that has the same effect as the oneintended by the designer
■ Muse-G probes each possible grouping attribute: start with cbranch
Example source
CompaniesRedmond Microsoft USAS. Valley Microsoft USA
Projects
P1 DB Redmond e4P2 Web S. Valley e5
Employeese4 John x234e5 Anna x888
Target Scenario 1group
by cbranchOrgs
MicrosoftProjects:
DB e4
MicrosoftProjects:
Web e5
Employeese4 Johne5 Anna
Target Scenario 2do not groupby cbranch
OrgsMicrosoftProjects:
DB e4Web e5
Employeese4 Johne5 Anna
Muse-G: Grouping Semantics Design
9 / 23
■ Goal: infer a grouping function that has the same effect as the oneintended by the designer
■ Muse-G probes each possible grouping attribute: start with cbranch
Example source
CompaniesRedmond Microsoft USAS. Valley Microsoft USA
Projects
P1 DB Redmond e4P2 Web S. Valley e5
Employeese4 John x234e5 Anna x888
Target Scenario 1group
by cbranchOrgs
MicrosoftProjects:
DB e4
MicrosoftProjects:
Web e5
Employeese4 Johne5 Anna
Target Scenario 2do not groupby cbranch
OrgsMicrosoftProjects:
DB e4Web e5
Employeese4 Johne5 Anna
Muse-G: Grouping Semantics Design
9 / 23
■ Goal: infer a grouping function that has the same effect as the oneintended by the designer
■ Muse-G probes each possible grouping attribute: start with cbranch
Example source
CompaniesRedmond Microsoft USAS. Valley Microsoft USA
Projects
P1 DB Redmond e4P2 Web S. Valley e5
Employeese4 John x234e5 Anna x888
Target Scenario 1group
by cbranchOrgs
MicrosoftProjects:
DB e4
MicrosoftProjects:
Web e5
Employeese4 Johne5 Anna
Target Scenario 2do not groupby cbranch
OrgsMicrosoftProjects:
DB e4Web e5
Employeese4 Johne5 Anna
Muse-G: Grouping Semantics Design
9 / 23
■ Goal: infer a grouping function that has the same effect as the oneintended by the designer
■ Muse-G probes each possible grouping attribute: start with cbranch
Example source
CompaniesRedmond Microsoft USAS. Valley Microsoft USA
Projects
P1 DB Redmond e4P2 Web S. Valley e5
Employeese4 John x234e5 Anna x888
Target Scenario 1group
by cbranchOrgs
MicrosoftProjects: SK(Redmond,y)
DB e4
MicrosoftProjects: SK(S. Valley,y)
Web e5
Employeese4 Johne5 Anna
Target Scenario 2do not groupby cbranch
OrgsMicrosoftProjects: SK(y)
DB e4Web e5
Employeese4 Johne5 Anna
y ⊆ { Microsoft, USA }
Muse-G: Second Question
10 / 23
■ The next probed attribute is cname
Example source
CompaniesS. Valley Microsoft USAMt. View Google USA
Projects
P1 DB S. Valley e4P4 Web Mt. View e6
Employeese4 John x234e6 Kat x331
Muse-G: Second Question
10 / 23
■ The next probed attribute is cname
Example source
CompaniesS. Valley Microsoft USAMt. View Google USA
Projects
P1 DB S. Valley e4P4 Web Mt. View e6
Employeese4 John x234e6 Kat x331
Target Scenario 1group
by cnameOrgs
MicrosoftProjects:
DB e4GoogleProjects:
Web e6
Employeese4 Johne6 Kat
Target Scenario 2do not groupby cname
OrgsMicrosoftProjects:
DB e4Web e6
GoogleProjects:
DB e4Web e6
Employeese4 Johne6 Kat
Muse-G: Second Question
10 / 23
■ The next probed attribute is cname
Example source
CompaniesS. Valley Microsoft USAMt. View Google USA
Projects
P1 DB S. Valley e4P4 Web Mt. View e6
Employeese4 John x234e6 Kat x331
Target Scenario 1group
by cnameOrgs
MicrosoftProjects:
DB e4GoogleProjects:
Web e6
Employeese4 Johne6 Kat
Target Scenario 2do not groupby cname
OrgsMicrosoftProjects:
DB e4Web e6
GoogleProjects:
DB e4Web e6
Employeese4 Johne6 Kat
Muse-G: Second Question
10 / 23
■ The next probed attribute is cname
Example source
CompaniesS. Valley Microsoft USAMt. View Google USA
Projects
P1 DB S. Valley e4P4 Web Mt. View e6
Employeese4 John x234e6 Kat x331
Target Scenario 1group
by cnameOrgs
MicrosoftProjects:
DB e4GoogleProjects:
Web e6
Employeese4 Johne6 Kat
Target Scenario 2do not groupby cname
OrgsMicrosoftProjects:
DB e4Web e6
GoogleProjects:
DB e4Web e6
Employeese4 Johne6 Kat
Muse-G: Second Question
10 / 23
■ The next probed attribute is cname
Example source
CompaniesS. Valley Microsoft USAMt. View Google USA
Projects
P1 DB S. Valley e4P4 Web Mt. View e6
Employeese4 John x234e6 Kat x331
Target Scenario 1group
by cnameOrgs
MicrosoftProjects: SK(Microsoft,y)
DB e4GoogleProjects: SK(Google,y)
Web e6
Employeese4 Johne6 Kat
Target Scenario 2do not groupby cname
OrgsMicrosoftProjects: SK(y)
DB e4Web e6
GoogleProjects: SK(y)
DB e4Web e6
Employeese4 Johne6 Kat
y ⊆ { USA }
Muse-G: Second Question
10 / 23
■ The next probed attribute is cname
Example source
CompaniesS. Valley Microsoft USAMt. View Google USA
Projects
P1 DB S. Valley e4P4 Web Mt. View e6
Employeese4 John x234e6 Kat x331
Target Scenario 1group
by cnameOrgs
MicrosoftProjects: SK(Microsoft,y)
DB e4GoogleProjects: SK(Google,y)
Web e6
Employeese4 Johne6 Kat
Target Scenario 2do not groupby cname
OrgsMicrosoftProjects: SK(y)
DB e4Web e6
GoogleProjects: SK(y)
DB e4Web e6
Employeese4 Johne6 Kat
y ⊆ { USA }
■ The wizard continues to probe the remaining possible grouping attributes
Obtaining Source Examples
11 / 23
Running queries over the real source instance I
Example: probing on cname
Obtaining Source Examples
11 / 23
Running queries over the real source instance I
Example: probing on cname
Query:
Companies(c1,n1,l1) ∧
Projects(p1,pn1,c1,e1) ∧
Employees(e1,en1,cn1) ∧
Obtaining Source Examples
11 / 23
Running queries over the real source instance I
Example: probing on cname
Query:
Companies(c1,n1,l1) ∧
Projects(p1,pn1,c1,e1) ∧
Employees(e1,en1,cn1) ∧
Companies(c2,n2,l1) ∧
Projects(p2,pn2,c2,e2) ∧
Employees(e2,en2,cn2) ∧
Obtaining Source Examples
11 / 23
Running queries over the real source instance I
Example: probing on cname
Query:
Companies(c1,n1,l1) ∧
Projects(p1,pn1,c1,e1) ∧
Employees(e1,en1,cn1) ∧
Companies(c2,n2,l1) ∧
Projects(p2,pn2,c2,e2) ∧
Employees(e2,en2,cn2) ∧
n1 6= n2
Obtaining Source Examples
11 / 23
Running queries over the real source instance I
Example: probing on cname
Query:
Companies(c1,n1,l1) ∧
Projects(p1,pn1,c1,e1) ∧
Employees(e1,en1,cn1) ∧
Companies(c2,n2,l1) ∧
Projects(p2,pn2,c2,e2) ∧
Employees(e2,en2,cn2) ∧
n1 6= n2
Non-emptyresult
Real Example:
CompaniesS. Valley Microsoft USAMt. View Google USA
ProjectsP1 DB S. Valley e4P4 Web Mt. View e6
Employeese4 John x234e6 Kat x331
Obtaining Source Examples
11 / 23
Running queries over the real source instance I
Example: probing on cname
Query:
Companies(c1,n1,l1) ∧
Projects(p1,pn1,c1,e1) ∧
Employees(e1,en1,cn1) ∧
Companies(c2,n2,l1) ∧
Projects(p2,pn2,c2,e2) ∧
Employees(e2,en2,cn2) ∧
n1 6= n2
Non-emptyresult
Real Example:
CompaniesS. Valley Microsoft USAMt. View Google USA
ProjectsP1 DB S. Valley e4P4 Web Mt. View e6
Employeese4 John x234e6 Kat x331
Emptyresult
Synthetic Example:
Companiesc1 n1 l1c2 n2 l1
Projectsp1 pn1 c1 e1
p2 pn2 c2 e2
Employeese1 en1 cn1
e2 en2 cn2
Muse-G with FDs
12 / 23
■ Considering functional dependencies in the source can reduce the numberof questions posed to the designer
Muse-G with FDs
12 / 23
■ Considering functional dependencies in the source can reduce the numberof questions posed to the designer
■ Two mappings M1, M2 have the same effect if for any source instance I,the result of exchanging I with M1 is the “same” as the result ofexchanging I with M2
Homomorphically equivalent
Muse-G with FDs
12 / 23
■ Considering functional dependencies in the source can reduce the numberof questions posed to the designer
■ Two mappings M1, M2 have the same effect if for any source instance I,the result of exchanging I with M1 is the “same” as the result ofexchanging I with M2
Homomorphically equivalent
Proposition. If a FD A → B holds, then a mapping M that groups by A has
the same effect as a mapping M that groups by A ∪ C, where C ⊆ B.
Muse-G with FDs
12 / 23
■ Considering functional dependencies in the source can reduce the numberof questions posed to the designer
■ Two mappings M1, M2 have the same effect if for any source instance I,the result of exchanging I with M1 is the “same” as the result ofexchanging I with M2
Homomorphically equivalent
Proposition. If a FD A → B holds, then a mapping M that groups by A has
the same effect as a mapping M that groups by A ∪ C, where C ⊆ B.
■ Suppose cbranch is a key, then we may save some questions
◆ If the designer chooses Scenario 1 (including cbranch in the groupingfunction), probing on cname or location is no longer necessary
Muse-G: Properties
13 / 23
Proposition (Completeness). If there are n possible grouping attributes for a
nested set S, then the questions asked by Muse-G explore the entire space of
2n grouping functions. Muse-G asks at most n questions to infer the desired
grouping semantics for S.
Muse-G: Properties
13 / 23
Proposition (Completeness). If there are n possible grouping attributes for a
nested set S, then the questions asked by Muse-G explore the entire space of
2n grouping functions. Muse-G asks at most n questions to infer the desired
grouping semantics for S.
Proposition (Small examples). At each probe, Muse-G constructs a source
example of size at most twice the number of conjuncts in the for clause of the
mapping.
Muse-G: Properties
13 / 23
Proposition (Completeness). If there are n possible grouping attributes for a
nested set S, then the questions asked by Muse-G explore the entire space of
2n grouping functions. Muse-G asks at most n questions to infer the desired
grouping semantics for S.
Proposition (Small examples). At each probe, Muse-G constructs a source
example of size at most twice the number of conjuncts in the for clause of the
mapping.
■ Incremental design: group more or less starting from an existing groupingfunction
Muse-G: Properties
13 / 23
Proposition (Completeness). If there are n possible grouping attributes for a
nested set S, then the questions asked by Muse-G explore the entire space of
2n grouping functions. Muse-G asks at most n questions to infer the desired
grouping semantics for S.
Proposition (Small examples). At each probe, Muse-G constructs a source
example of size at most twice the number of conjuncts in the for clause of the
mapping.
■ Incremental design: group more or less starting from an existing groupingfunction
■ Design for a specific source instance: reduce the number of questions
◆ Muse-G identifies attributes whose inclusion/exclusion as argumentsof grouping functions is inconsequential
Ambiguous Mappings
14 / 23
CompDB: Rcd
Projects: Set of
Project: Rcd
pidpnamemanagertech-lead
Employees: Set of
Employee: Rcd
eidenamecontact
Ambiguous Mappings
14 / 23
CompDB: Rcd
Projects: Set of
Project: Rcd
pidpnamemanagertech-lead
Employees: Set of
Employee: Rcd
eidenamecontact
OrgDB: Rcd
Projects: Set of
Project: Rcdpnamesupervisoremail
Ambiguous Mappings
14 / 23
CompDB: Rcd
Projects: Set of
Project: Rcd
pidpnamemanagertech-lead
Employees: Set of
Employee: Rcd
eidenamecontact
OrgDB: Rcd
Projects: Set of
Project: Rcdpnamesupervisoremail
Ambiguous Mappings
14 / 23
CompDB: Rcd
Projects: Set of
Project: Rcd
pidpnamemanagertech-lead
Employees: Set of
Employee: Rcd
eidenamecontact
OrgDB: Rcd
Projects: Set of
Project: Rcdpnamesupervisoremail
forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees
satisfye1.eid = p.managere2.eid = p.tech-lead
Ambiguous Mappings
14 / 23
CompDB: Rcd
Projects: Set of
Project: Rcd
pidpnamemanagertech-lead
Employees: Set of
Employee: Rcd
eidenamecontact
OrgDB: Rcd
Projects: Set of
Project: Rcdpnamesupervisoremail
forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees
satisfye1.eid = p.managere2.eid = p.tech-lead
exists
p1 in OrgDB.Projects
Ambiguous Mappings
14 / 23
CompDB: Rcd
Projects: Set of
Project: Rcd
pidpnamemanagertech-lead
Employees: Set of
Employee: Rcd
eidenamecontact
OrgDB: Rcd
Projects: Set of
Project: Rcdpnamesupervisoremail
forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees
satisfye1.eid = p.managere2.eid = p.tech-lead
exists
p1 in OrgDB.Projectswhere
p.pname = p1.pname
Ambiguous Mappings
14 / 23
CompDB: Rcd
Projects: Set of
Project: Rcd
pidpnamemanagertech-lead
Employees: Set of
Employee: Rcd
eidenamecontact
OrgDB: Rcd
Projects: Set of
Project: Rcdpnamesupervisoremail
forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees
satisfye1.eid = p.managere2.eid = p.tech-lead
exists
p1 in OrgDB.Projectswhere
p.pname = p1.pnamep1.supervisor =
e1.ename or e2.ename
Ambiguous Mappings
14 / 23
CompDB: Rcd
Projects: Set of
Project: Rcd
pidpnamemanagertech-lead
Employees: Set of
Employee: Rcd
eidenamecontact
OrgDB: Rcd
Projects: Set of
Project: Rcdpnamesupervisoremail
forp in CompDB.Projectse1 in CompDB.Employeese2 in CompDB.Employees
satisfye1.eid = p.managere2.eid = p.tech-lead
exists
p1 in OrgDB.Projectswhere
p.pname = p1.pnamep1.supervisor =
e1.ename or e2.ename
p1.email =e1.contact or e2.contact
AmbiguousElements
■ This mapping is ambiguous■ There are four alternative interpretations
e1.ename e1.ename e2.ename e2.enamee1.contact e2.contact e1.contact e2.contact
Muse-D: Disambiguating Mappings
15 / 23
■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way
Muse-D: Disambiguating Mappings
15 / 23
■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way
Projects
P1 DB e4 e5Employees
e4 John john@ibme5 Anna anna@ibm
Muse-D: Disambiguating Mappings
15 / 23
■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way
Projects
P1 DB e4 e5Employees
e4 John john@ibme5 Anna anna@ibm
OrgsProjects:
DB John john@ibm
Anna anna@ibm
Muse-D: Disambiguating Mappings
15 / 23
■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way
Projects
P1 DB e4 e5Employees
e4 John john@ibme5 Anna anna@ibm
OrgsProjects:
DB John john@ibm
Anna anna@ibm
Designer makestwo choices
■ The mapping designer makes one choice for each ambiguous element■ Each decision removes one ambiguity
Muse-D: Disambiguating Mappings
15 / 23
■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way
Projects
P1 DB e4 e5Employees
e4 John john@ibme5 Anna anna@ibm
OrgsProjects:
DB John john@ibm
Anna anna@ibm
Designer makestwo choices
■ The mapping designer makes one choice for each ambiguous element■ Each decision removes one ambiguity
◆ E.g., choosing “Anna” as the supervisor and “john@ibm” as the email
p1.supervisor =e1.ename or e2.ename
p1.email =e1.contact or e2.contact
Muse-D: Disambiguating Mappings
15 / 23
■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way
Projects
P1 DB e4 e5Employees
e4 John john@ibme5 Anna anna@ibm
OrgsProjects:
DB John john@ibm
Anna anna@ibm
Designer makestwo choices
■ The mapping designer makes one choice for each ambiguous element■ Each decision removes one ambiguity
◆ E.g., choosing “Anna” as the supervisor and “john@ibm” as the email
p1.supervisor =e1.ename or e2.ename
p1.email =e1.contact or e2.contact
Muse-D: Disambiguating Mappings
15 / 23
■ Key idea: provide an example that illustrates the alternativeinterpretations in a compact way
Projects
P1 DB e4 e5Employees
e4 John john@ibme5 Anna anna@ibm
OrgsProjects:
DB John john@ibm
Anna anna@ibm
Designer makestwo choices
■ The mapping designer makes one choice for each ambiguous element■ Each decision removes one ambiguity
◆ E.g., choosing “Anna” as the supervisor and “john@ibm” as the email
p1.supervisor =e1.ename or e2.ename
p1.email =e1.contact or e2.contact
Obtaining Source Examples
16 / 23
Running queries over the real source instance
Obtaining Source Examples
16 / 23
Running queries over the real source instance
Query:Projects(p1,pn1,e1,e2) ∧
Employees(e1,en1,cn1) ∧
Employees(e2,en2,cn2) ∧
Obtaining Source Examples
16 / 23
Running queries over the real source instance
Query:Projects(p1,pn1,e1,e2) ∧
Employees(e1,en1,cn1) ∧
Employees(e2,en2,cn2) ∧
en1 6= en2 ∧ cn1 6= cn2
Obtaining Source Examples
16 / 23
Running queries over the real source instance
Query:Projects(p1,pn1,e1,e2) ∧
Employees(e1,en1,cn1) ∧
Employees(e2,en2,cn2) ∧
en1 6= en2 ∧ cn1 6= cn2
Real Example:Projects
P1 DB e4 e5Employees
e4 John john@ibme5 Anna anna@ibm
Non-emptyresult
Obtaining Source Examples
16 / 23
Running queries over the real source instance
Query:Projects(p1,pn1,e1,e2) ∧
Employees(e1,en1,cn1) ∧
Employees(e2,en2,cn2) ∧
en1 6= en2 ∧ cn1 6= cn2
Real Example:Projects
P1 DB e4 e5Employees
e4 John john@ibme5 Anna anna@ibm
Non-emptyresult
Synthetic Example:Projects
p1 pn1 e1 e2
Employeese1 en1 cn1
e2 en2 cn2
Emptyresult
Muse-D: Properties
17 / 23
■ For each ambiguous mapping, the designer is presented with a singleexample
Muse-D: Properties
17 / 23
■ For each ambiguous mapping, the designer is presented with a singleexample
Proposition (Completeness). The single example differentiates among all the
alternative interpretations of the ambiguous mapping. The mapping designer
has to make a number of choices equal to the number of ambiguous elements.
Muse-D: Properties
17 / 23
■ For each ambiguous mapping, the designer is presented with a singleexample
Proposition (Completeness). The single example differentiates among all the
alternative interpretations of the ambiguous mapping. The mapping designer
has to make a number of choices equal to the number of ambiguous elements.
Proposition (Small examples). The number of tuples in the example source
instance is the number of conjuncts in the for clause of the mapping.
Experiments: Setting
18 / 23
Mapping Size of Sets with Number Ambiguous AlternativeScenarios real source refinable of mappings interpretations
instance grouping mappings
Mondial 1MB 8 26 7 208
DBLP 2.6MB 6 4 0 4
TPCH 10MB 4 5 1 20
Amalgam 2MB 2 14 0 14
■ Mapping system: Clio■ Query evaluation: DB2 v9, Saxon-B 8.9■ MUSE implementation: Java 6
Experiments: Muse-G
19 / 23
Mapping Average # Number of % times Average time Groupingscenario of grouping questions found real to get real strategy
attributes (average) example example (s)
2.6 38% 0.014 G1
Mondial 13.1 8.5 41% 0.187 G2
2.9 40% 0.015 G3
1.5 17% 0.450 G1
DBLP 11 11 11% 0.337 G2
1.5 17% 0.454 G3
1.5 0% 0.785 G1
TPCH 26.7 17 12% 0.893 G2
1.5 0% 0.782 G3
2 29% 0.013 G1
Amalgam 14.1 3 52% 0.043 G2
3 52% 0.030 G3
G1: group by all possible attributes (cbranch,cname,location,pid,. . . )G2: group by all attributes exported above the set (cname)G3: group by all exported attributes (cname, pname, eid, ename)
Experiments: Muse-D
20 / 23
Mapping Alternatives Number of Size of Number ofScenario encoded questions source example ambiguous values
(# of tuples) in target instance
Mondial 208 7 3–4 4–5
TPCH 16 1 9 4
Related Work
21 / 23
■ There are many related works, we focus here on the closest one■ We were inspired by DataViewer [Yan et al. 01]
Feature \ System MUSE DataViewer
Examples Compact As many asto be analyzed representation alternative interpretations
(e.g. 7) (e.g. 208)Grouping Semantics Yes No
Data Models Relational and XML RelationalMapping Source-target SQLLanguage mappings (GLAV)
Conclusion
22 / 23
■ MUSE: a mapping design wizard
◆ Use examples to understand, design, refine schema mappings
◆ Work with complete and small data examples rather thancomplex specifications
◆ Focus on two important aspects of a mapping specification
■ Grouping semantics
■ Interpretation of ambiguous mappings
23 / 23
Thank you