Top Banner
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego
49

Lessons from the TSIMMIS Project

Jan 02, 2016

Download

Documents

jileen-caffrey

Lessons from the TSIMMIS Project. Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego. Overview. TSIMMIS’ goals, technical challenges, and solutions Insufficiencies of the TSIMMIS’ framework Going forward. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lessons from the TSIMMIS Project

1

Lessons from the TSIMMIS Project

Yannis PapakonstantinouDepartment of Computer Science &

Engineering

University of California, San Diego

Page 2: Lessons from the TSIMMIS Project

2

Overview

• TSIMMIS’ goals, technical challenges, and solutions

• Insufficiencies of the TSIMMIS’ framework

• Going forward

Page 3: Lessons from the TSIMMIS Project

3

Information Resides on Heterogeneous Information Sources

• different interfaces• different data representations• redundant and conflicting information

WWWTickerTape

PersonaldatabaseDialog

Page 4: Lessons from the TSIMMIS Project

4

Goal: System Providing Integrated View of Heterogeneous Data

Integration System

WWW Personaldatabase

• collects and combines information• provides integrated view, uniform user interface

TickerTapeDialog

Page 5: Lessons from the TSIMMIS Project

5

The Wrapper and Mediator Architecture

Mediator

WrapperWrapper

Client

business reports

portfolios for each company

stock market prices

TickerTape Dialog

CommonData Model

Page 6: Lessons from the TSIMMIS Project

6

The Data Warehousing Approach to Integration

Mediator

WrapperWrapper

Client

TickerTape Dialog

Stored Integrated

View

Page 7: Lessons from the TSIMMIS Project

7

The Lazy Integration Approach

Mediator

WrapperWrapper

Client

IBM portfolio

IBM price IBM related reports (in common model)

IBM related reports

TickerTape Dialog

Query Decomposition, Translation and Result Fusion

Page 8: Lessons from the TSIMMIS Project

8

Mediator

Client

Wrapper

Wrappers & Mediators from High-Level Specifications

Mediator SpecificationInterpreter

WrapperGenerator

Wrapper

WrapperSpecification

MediatorSpecification

Source Source

Page 9: Lessons from the TSIMMIS Project

9

Challenge: Sources Without a Well-Structured Schema

• semistructured– irregular– deeply nested– cross-referenced

• incomplete schema knowledge– autonomous– dynamic

• HTML pages• SGML documents• genome data• chemical structures• bibliographic

information• results of the

integration process

Examples

Page 10: Lessons from the TSIMMIS Project

10

Challenge: Different and Limited Source Capabilities

Client

Wrapper(A)

Wrapper(B)

Mediator(U = A + B)

retrieve IBM dataretrieve IBM data

retrieve IBM data

Page 11: Lessons from the TSIMMIS Project

11

Mediator has to Adapt to Query Capabilities of Sources

Client

Wrapper(A)

Wrapper(B)

Mediator(U = A + B)

retrieve everything

retrieve IBM data

retrieve IBM data

retrieve IBM data

(A) does notallow selection

Page 12: Lessons from the TSIMMIS Project

12

Part B

• Semistructured Data Representation

• Mediator Generation

• Wrapper Generation

• Capabilities-Based Rewriting

Page 13: Lessons from the TSIMMIS Project

13

Representation of Semistructured Information using OEM

semanticobject-id

label

Atomic Value

Set Value

structuralobject-id

<http://www/~doe, faculty, {&f1,&l1,&r1}> <&f1, first_name, “John”> <&l1, last_name, “Doe”> <&r1, rank, “professor”>

Page 14: Lessons from the TSIMMIS Project

14

Graph Representation of OEM Data

faculty first_name “John” last_name “Doe” rank “professor”

http://www/~doe

<http://www/~doe, faculty, {&f1,&l1,&r1}> <&f1, first_name, “John”> <&l1, last_name, “Doe”> <&r1, rank, “professor”>

Page 15: Lessons from the TSIMMIS Project

15

OEM Structures Represent Arbitrary Labeled Graphs

faculty first_name “John” last_name “Doe” rank “professor”

http://www/~doe

faculty name “Mary Smith” project “Air DB” paper

author name “John Doe”

author name “Mary Smith”

title “Thin Air DB”

http://www/~smith

Page 16: Lessons from the TSIMMIS Project

16

Overview

• Semistructured Data Representation

• Mediator Generation• Example of mediator specification• Language expressiveness• Implementation and performance

• Wrapper Generation

• Capabilities-Based Rewriting

Page 17: Lessons from the TSIMMIS Project

17

Merge Information Relating to a Faculty

person name “John Doe” birthday “April 1”

s2faculty name “John Doe” rank “professor” papers ...

s1

faculty name “John Doe” rank “professor” birthday “April 1” papers ...

Page 18: Lessons from the TSIMMIS Project

18

Mediator Specification Example

person name “John Doe” birthday “April 1”

s2

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

faculty name “John Doe” rank “professor” papers ...

s1

faculty name “John Doe” rank “professor” birthday “April 1” papers ...

Page 19: Lessons from the TSIMMIS Project

19

Mediator Specification Example: Semantics of Rule Bodies

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

person name “John Doe” birthday “April 1”

s2

faculty name “John Doe” rank “professor” birthday “April 1” papers ...

faculty name “John Doe” rank “professor” papers ...

s1

Page 20: Lessons from the TSIMMIS Project

20

Mediator Specification Example: Semantics of Rule Heads

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

person name “John Doe” birthday “April 1”

s2

“John Doe”faculty name “John Doe” rank “professor” birthday “April 1” papers ...

faculty name “John Doe” rank “professor” papers ...

s1

Page 21: Lessons from the TSIMMIS Project

21

Incrementally Add to Semantically Identified Object

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

faculty name “John Doe” rank “professor” papers ...

s1person name “John Doe” birthday “April 1”

s2

“John Doe”faculty name “John Doe” rank “professor” birthday “April 1” papers ...

Page 22: Lessons from the TSIMMIS Project

22

Irregularities & Incomplete Schema Knowledge

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1faculty name “John Doe” rank “professor” papersfaculty name “Mary Smith” project “Air DB”

s1

person name “John Doe” birthday “April 1”

s2

faculty name “John Doe” rank “professor” birthday “April 1” papers faculty name “Mary Smith” project “Air DB”

“John Doe”

“Mary Smith”

Page 23: Lessons from the TSIMMIS Project

23

Second Rule Attaches More Subobjects to View Objects

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

faculty name “John Doe” rank “professor” papers ...

s1

“John Doe”faculty name “John Doe” rank “professor” birthday “April 1” papers ...

person name “John Doe” birthday “April 1”

s2

Page 24: Lessons from the TSIMMIS Project

24

Language Expressiveness

• Information fusion problems solved by MSL– Irregularities– Incomplete knowledge of source structure– Transformation of cross-referenced structures– Inconsistent and redundant data– Use of arbitrary matching criteria

• Theoretical analysis of expressiveness– Consider the relational representation of OEM

graphs. Then MSL is equivalent to “SQL + special form of transitive closure”

Page 25: Lessons from the TSIMMIS Project

25

faculty name “John Doe” rank “associate”

Inconsistent and Redundant Information

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

AND NOT <faculty {<name N> <L V1>}>@s1

person name “John Doe” rank “assistant”

s1 s2

“John Doe”faculty

name “John Doe” rank “associate”

rank “assistant”

Page 26: Lessons from the TSIMMIS Project

26

Overview

• Semistructured Data Representation

• Mediator Generation• Example of mediator specification• Language expressiveness• Implementation and performance

• Wrapper Generation

• Capabilities-Based Rewriting

Page 27: Lessons from the TSIMMIS Project

27

Mediator Specification Interpreter Architecture

Query Rewriter

Cost-Based Optimizer

Datamerge Engine

MediatorSpecification

Query

logical datamergeprogram

plan

Result

Queries toWrappers

Results

Page 28: Lessons from the TSIMMIS Project

28

Query Rewriting When Known Origins of Information

• <N faculty {<salary S>}> :-:- <faculty {<name N> <salary S>}>@s1

<N faculty {< rank R >}> :- <person {<name N> <rank

R>}>@s2• <well-paid {<name N> <salary X>}>

:- <N faculty {<salary X> <rank assistant>}> AND X>65000

Page 29: Lessons from the TSIMMIS Project

29

Query Rewriter Pushes Conditions to Sources

• <N faculty {<salary S>}> :- :- <faculty {<name N> <salary S>}>@s1 <N faculty {< rank R >}>

:- <person {<name N> <rank R>}>@s2• <well-paid {<name N> <salary X>}> :- <N faculty {<salary

X> <rank assistant>}> AND X>65000• logical datamerge program <well-paid {<name

N> <salary X>}> :- (<faculty {<name N> <salary X>}> AND X>65000)@s1

AND <person {<name N> <rank assistant>}>@s2

Page 30: Lessons from the TSIMMIS Project

30

<name N> :- <person {<rank assistant>}>

Passing Bindings & Local Join Plans

Passing Bindings

Local Join

<salary X> :- <faculty {<name $N> <salary X>}> AND X>65000

<name N> :- <person {<rank assistant>}>

<a {<s X> <n N>}>:- <faculty {<name N> <salary X>}> AND X>65000

N

s1 s2

s1 s2

Page 31: Lessons from the TSIMMIS Project

31

Query Decomposition When Unknown Origins of Information

<X faculty {<S Y>}> :- <X faculty {<birthday “1/20”> <S Y>}>

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

Page 32: Lessons from the TSIMMIS Project

32

Plan Considers All Possible Sources of birthday

<X faculty {<S Y>}> :- <X faculty {<birthday “1/20”> <S Y>}>

<N faculty {<L V>}> :- <faculty {<name N> <L V>}>@s1<N faculty {<L V>}> :- <person {<name N> <L V>}>@s2

name

s2s1

name

birthday

birthday

Page 33: Lessons from the TSIMMIS Project

33

Overview

• Semistructured-Data Representation

• Mediator Generation

• Wrapper Generation

• Capabilities-Based Rewriting

Page 34: Lessons from the TSIMMIS Project

34

Query Translation in Wrappers

Source

SELECT * FROM personSELECT * FROM personWHERE name=“Smith”

find -allfind -n Smith

Query TranslatorResult

Translator

Wrapper

Page 35: Lessons from the TSIMMIS Project

35

Rapid Query Translation Using Templates and Actions

Source

SELECT * FROM personSELECT * FROM personWHERE name=“Smith”

find -allfind -n Smith

TemplateInterpreter

ResultTranslator

SELECT * FROM person {emit “find -all” }SELECT * FROM personWHERE name=$N {emit “find -n $N”}

Page 36: Lessons from the TSIMMIS Project

36

Description of Infinite Sets of Supported Queries

• uses recursive nonterminals

• Example:– job description contains word w1 and word w2

and ...– SELECT subset(person) FROM person

WHERE \CJob\CJob : job LIKE $W AND \CJob\CJob : TRUE

Page 37: Lessons from the TSIMMIS Project

37

Overview

• Semistructured-Data Representation

• Mediator Generation

• Wrapper Generation

• Capabilities-Based Rewriting

Page 38: Lessons from the TSIMMIS Project

38

Wrapper Supported Queries

Description

Capabilities-Based Rewriter in Mediator Architecture

Capabilities-Based

Rewriter

QueryRewriter

Cost-BasedOptimizer

DatamergeEngine

logical datamerge program

supportedplans

optimal plan

MediatorSpecification

Wrapper Supported Queries

Description

Query

Page 39: Lessons from the TSIMMIS Project

39

Capabilities-Based Rewriter Finds Supported Plans

Supported Queries

SELECT * FROM AWHERE salary>65000

SELECT * FROM A

Page 40: Lessons from the TSIMMIS Project

40

Capabilities-Based Rewriter Finds Most-Selective Supported Plans

Supported Queries

SELECT * FROM BWHERE salary>65000

SELECT * FROM BSELECT * FROM BWHERE salary >65000

Page 41: Lessons from the TSIMMIS Project

41

Capabilities-Based Rewriter Architecture

Component SubQueryDiscovery

Plan Construction

Plan Refinement

Query CapabilitiesDescription

Component SubQueries

Plans (not fully optimized)

Query

Algebraically optimal plans

Page 42: Lessons from the TSIMMIS Project

42

What TSIMMIS Achieved

• system for integration of heterogeneous sources

• challenges and solutions– semistructured data & incomplete schema

knowledge• appropriate specification language and query processing

algorithms

– limited and different query capabilities• query translation algorithm

• capabilities-based query rewriting algorithm

Page 43: Lessons from the TSIMMIS Project

43

Overview

• TSIMMIS’ goals, technical challenges, and solutions

• Insufficiencies of the TSIMMIS’ framework

• Going forward

Page 44: Lessons from the TSIMMIS Project

44

Insufficiencies of the TSIMMIS framework

• OEM was really unstructured data– some loose and partial schematic info may

pay off tremendously

• too “databasy” user/mediator/source interaction

Page 45: Lessons from the TSIMMIS Project

45

Overview

• TSIMMIS’ goals, technical challenges, and solutions

• Insufficiencies of the TSIMMIS’ framework

• Going forward

Page 46: Lessons from the TSIMMIS Project

46

Web emerges as a Distributed DB and XML as its Data Model

DataSource

Native XMLDatabase

XML ViewDocument(s)

XML ViewDocument(s)

XML ViewDocument(s)

Also export:1. Schemas & Metadata (XML-Data, RDF,…)2. Description of supported queries

Wrapper

LegacySource

XMAS QueryLanguage

Page 47: Lessons from the TSIMMIS Project

47

Definition of Integrated Views

DataSource

DataSource

DataSource

Mediator

XML ViewDocument(s)

Integrated XML View

XML ViewDocument(s)

XML ViewDocument(s)

View Definition inXMAS

Page 48: Lessons from the TSIMMIS Project

48

Non-Materialized Views in the MIX mediator system

Blended Browsing &Querying (BBQ) GUI

Application

DOM for Virtual XML Doc’s

MIX Mediator

XMAS query XML document

DTDInference

IntegratedView DTD

XML Source XML Source

QueryProcessor

View Definition inXMAS

Source DTD

Page 49: Lessons from the TSIMMIS Project

49

RDB2XMLWrapper

DTDInference

Resolution

Simplification

Execution

Unfolded Query

Blended Browsing &Querying (BBQ) GUI

MIX MediatorXMAS MediatorView Definition

View DTD

Translation to Algebra

Optimization

XML DocumentFragments

XMAS Query

XMLSource 1

DTD

XMASQuery

XMLDocumentFragments

DOM (VXD) Client API

Application