Top Banner
Dan Suciu Tools for XML Data Exchange Tools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez
26

Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dec 27, 2015

Download

Documents

Stuart Bradley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Tools for XML Data Exchange

Dan SuciuAT&T Labs

Joint work with Mary Fernandez

Page 2: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

XML Has Many Facets

• XML for fancier Web pages

– XML generated with structural editors

• XML for messaging

– generated during applications

• XML for Data Exchange

– generated from legacy data

Page 3: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

XML in Data Exchange

• communities agree on common DTD

• export their data in XML

• exchange over HTTP protocol

• applications understand only that DTD

Page 4: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

An Example of XML Data<book> <publisher> Addison-Wesley </publisher>

<author> Serge Abiteboul </author>

<author> <first-name> Rick </first-name>

<last-name> Hull </last-name>

<author> Victor Vianu </author>

<title> Foundations of Databases </title>

<year> 1995 </year>

</book>

<book> <publisher> Freeman </publisher>

<author> Jeffrey D. Ullman </author>

<title> Principles of Database and Knowledge Base Systems </title>

<year> 1998 </year>

</book>

Page 5: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

XML Exchange Vision

application

relational data

Transform

Integrate

Warehouse

XML Data WEB (HTTP)

application

application

legacy data

object-relational

Page 6: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Tools

• export legacy data to XML– RXL

• query/transform/integrate XML data– XML-QL

• compress XML data– XMill

• store/process incoming XML data– STORED

Page 7: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

XML-QL: A Query Language for XML

• http://www.w3.org/TR/NOTE-xml-ql (8/98)

• W3C new Working Group on QL (9/99)

• XML-QL characteristics:– relational complete (like SQL)– XML input, XML output– queries, transforms, integrates XML data

[Deutsch et al., 1999 (WWW8)]

Page 8: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Querying in XML-QL

where <book language=“french”> <publisher> <name> Morgan Kaufmann </name> </publisher> <author> $a </author> </book> in “www.a.b.c/bib.xml”construct $a

where <book language=“french”> <publisher> <name> Morgan Kaufmann </name> </publisher> <author> $a </author> </book> in “www.a.b.c/bib.xml”construct $a

Pattern

Page 9: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Transformations in XML-QL

Note: </> abbreviates </book> or </result> or ...

where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author> $a </> <lang> $l </> </>

where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author> $a </> <lang> $l </> </>

<result> <author>. . .</author><lang>. . .</lang></result><result> <author>. . .</author><lang>. . .</lang></result><result> <author>. . .</author><lang>. . .</lang></result>

Template

Page 10: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Transformations in XML-QL

where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author id=F($a)> $a</> <lang> $l </> </>

where <book language = $l> <author> $a </> </> in “www.a.b.c/bib.xml”construct <result> <author id=F($a)> $a</> <lang> $l </> </>

<result> <author>. . .</author> <lang>. . .</lang> <lang>. . .</lang> </result><result> <author>. . .</author> <lang>. . .</lang> <lang>. . .</lang> </result>

Skolem Functions in Templates

Page 11: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Data Integration in XML-QL

{ where <book > <isbn> $n </> <title> $t </> </> in “www.books.com” construct <result id=F($n)> <title> $t </> </> }

{ where <review> <isbn> $n </> <review> $r </> </> in “www.reviews.com”construct <result id=F($n)> <review> $r </> </> }

{ where <book > <isbn> $n </> <title> $t </> </> in “www.books.com” construct <result id=F($n)> <title> $t </> </> }

{ where <review> <isbn> $n </> <review> $r </> </> in “www.reviews.com”construct <result id=F($n)> <review> $r </> </> }

<result id=“..” > <title>. . .</title> <review>. . .</review> <review>. . .</review> </result>

Page 12: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

RXL:Export Legacy Data To XML• legacy data

– fragmented into many flat relations– 3rd normal form– schema is proprietary

• XML data– nested– un-normalized– schema designed by agreement

Page 13: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

RXL: An Example

• relational database:

• virtual XML view:

<store> <name> n1 </name> <book> ... </book> <book> ... </book> ... </store> <store> <name>n2 </name> <book> ... </book> <book> ... </book> …</store>

s i d n a m e… …… …

Stores i d b i d… …… …

SBb i d t i t l e… …… …

Book

Page 14: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

A Simple RXL Query

• specify XML view declaratively

from Store, SB, Bookwhere Store.sid=SB.sid and SB.bid=Book.bidconstruct <store ID=f(Store.sid)> <name> Store.name </name> <book> Book.title </book> </store>

from Store, SB, Bookwhere Store.sid=SB.sid and SB.bid=Book.bidconstruct <store ID=f(Store.sid)> <name> Store.name </name> <book> Book.title </book> </store>

Page 15: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

RXL: Querying the XML View

• users ask XML-QL queries:– find stores who sell “The Calculus”

where <store> <name> $n </name> <book> The Calculus </book> <store>construct <result> $n </result>

where <store> <name> $n </name> <book> The Calculus </book> <store>construct <result> $n </result>

Page 16: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

RXL: Query composition

system composes query with view:from Store, SB, Bookwhere Store.sid=SB.sid and SB.bid=Book.bid and Book.title=“The Calculus”construct <result> Store.name </result>

from Store, SB, Bookwhere Store.sid=SB.sid and SB.bid=Book.bid and Book.title=“The Calculus”construct <result> Store.name </result>

s i d n a m e… …… …

Stores i d b i d… …… …

SBb i d t i t l e… …… …

Book<store> <name> n1 </name> <book> ... </book> <book> ... </book> ... </store> <store> <name>n2 </name> <book> ... </book> <book> ... </book> …</store>

RXL XML-QL

Page 17: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Compressing XML Data

• for exchange and archiving

• can use general tool (gzip)

• but specialized tool twice as good (Xmill)

Page 18: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Xmill Example: Weblogs

202.239.238.16|GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478 |-|-|http://www02.so-net.or.jp/|Mozilla/3.01 [ja] (Win95; I)

<apache:entry> <apache:host>202.239.238.16</apache:host> <apache:requestLine>GET / HTTP/1.0</apache:requestLine> <apache:contentType>text/html</apache:contentType> <apache:statusCode>200</apache:statusCode> <apache:date>1997/10/01-00:00:02</apache:date> <apache:byteCount>4478</apache:byteCount> <apache:referer>http://www02.so-net.or.jp/</apache:referer> <apache:userAgent>Mozilla/3.01 [ja] (Win95; I)</apache:userAgent> </apache:entry></store>

Page 19: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Xmill Example: Weblogs

weblog.dat: 15.9MB weblog.dat.gz: 1.6MB

weblog.xml: 24.2MB weblog.xml.gz: 2.1MB

weblog1.xmi: 1.75MB

weblog2.xmi: 1.33MB

weblog3.xmi: 0.82MB

xmill -p // weblog.xml weblog1.xmixmill -p // weblog.xml weblog1.xmi

xmill weblog.xml weblog2.xmi xmill weblog.xml weblog2.xmi

xmill -f settings.pz weblog.xml weblog3.xmi xmill -f settings.pz weblog.xml weblog3.xmi

Page 20: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Xmill: Fine Tuning the Compression

-p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8)-p//apache:userAgent=>seq(e "/" e)-p//apache:byteCount=>u-p//apache:statusCode=>e-p//apache:contentType=>e-p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e)-p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di)-p//apache:referer=>or(seq("file:" t) seq("http://" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t)

-p//apache:host=>seqcomb(u8 "." u8 "." u8 "." u8)-p//apache:userAgent=>seq(e "/" e)-p//apache:byteCount=>u-p//apache:statusCode=>e-p//apache:contentType=>e-p//apache:requestLine=>seq("GET " rep("/" e) " HTTP/1." e)-p//apache:date=>seq(u "/" u8 "/" u8 "-" u8 ":" di ":" di)-p//apache:referer=>or(seq("file:" t) seq("http://" or(seq(rep("." e) "/" rep("/" e)) rep("." e))) t)

Page 21: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Storing XML Data

• Scenario:– receive a large XML data instance– want to store, manage it

• Could build an XML management system from scratch (eXcelon)

• Preferably: use existing database systems

Page 22: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

&o1

&o3

&o2

&o4 &o5

paper

title author authoryear

&o6

“The Calculus” “…” “…” “1986”

Storing XML:Ternary Relation

[Florescu, Kossman 1999]

S o u r c e L a b e l D e s t

& o 1 p a p e r & o 2& o 2 t i t l e & o 3& o 2 a u t h o r & o 4& o 2 a u t h o r & o 5& o 2 y e a r & o 6

N o d e V a l u e

& o 3 T h e C a l c u l u s& o 4 …& o 5 …& o 6 1 9 8 6

Ref

Val

Page 23: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Storing XML:Derive Schema from DTD

• DTD:

• ODMG classes:

• [Christophides et al. 1994 , Shanmugasundaram et al. 1999]

<!ELEMENT employee (name, address, project*)><!ELEMENT address (street, city, state, zip)>

class Employee public type tuple (name:string, address:Address, project:List(Project))class Address public type tuple (street:string, …)

Page 24: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

STORED Approach:Mine Data to Derive Schema

paperpaper paper

paper

authorauthor author author author

titletitle title title

year

fn fn fn fn lnlnlnln

a u t h o r t i t l eX X

f n 1 l n 1 f n 2 l n 2 t i t l e y e a r

X X X X X -X X - - X XX X - - X -

Paper1

Paper2

[Deutsch et al. 1999]

Page 25: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

Summary

• XML - simple (?), lightweight syntax

• Challenge: build bridges to existing database tools

• XML in data exchange: YES

• XML as a new data model: NO

Page 26: Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.

Dan Suciu Tools for XML Data Exchange

More Info

http://www.research.att.com/~suciu

Data on the Web:

From Relational to Semistructured to XML

Morgan Kaufmann, 1999