Top Banner
Models and languages for semistructured data Bridging documents and databases
48

Models and languages for semistructured data Bridging documents and databases.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Models and languages for semistructured data Bridging documents and databases.

Models and languages forsemistructured data

Bridging documents and databases

Page 2: Models and languages for semistructured data Bridging documents and databases.

Lectures

1. Introduction to data models2. Query languages for relational

databases3. Models and query languages for object

databases4. Models and query languages for

semistructured data, XML5. Embedded query languages 6. Guest lecture on Object Role Modelling

Page 3: Models and languages for semistructured data Bridging documents and databases.

Why do we like types?

Types facilitate understanding

Types enable compact representations

Types enable query optimisation

Types facilitate consistency enforcement

Page 4: Models and languages for semistructured data Bridging documents and databases.

Background assumptions fortyped data

Data stable over timeOrganisational body to control data

Exercise: Give an example of a context where these assumptions do not hold

Page 5: Models and languages for semistructured data Bridging documents and databases.

Semistructured data

Semistructured data is schemaless and self describing

The data and the description of the data are integrated

Page 6: Models and languages for semistructured data Bridging documents and databases.

An example

{name: {first: “John”, last: “Smith”}, tel: 112233, email: “[email protected]”}

“John” “Smith”

112233 “[email protected]

name tel email

first last

Page 7: Models and languages for semistructured data Bridging documents and databases.

Another example

person person

name age name age

child

&o1 &o2

“Eva” 40 “Abel” 20

{person:&o1{name: “Eva”, age: 40, child: &o2},person:&o2{name: “Abel”, age: 20}}

An object identifier, such as &o1, before a structure, binds the object identifier to the identity of that structure. The object identifier can then be used to refer to the structure.

Page 8: Models and languages for semistructured data Bridging documents and databases.

Terminology

The following is an ssd-expression:

&o1{name: “Eva”, age: 40, child: &o2}

Label ValueObjectidentifier

Page 9: Models and languages for semistructured data Bridging documents and databases.

A database

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

Page 10: Models and languages for semistructured data Bridging documents and databases.

Path expressions

A path expression is a sequence of labels:l1.l2…ln

A path expression results in a set of nodes

Path properties are specified by regular expressions on two levels: on the alphabet of labels and on the alphabet of characters that comprise labels

Page 11: Models and languages for semistructured data Bridging documents and databases.

A path expression

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

biblio.book.author

Page 12: Models and languages for semistructured data Bridging documents and databases.

A path expression

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

biblio.(book l paper).author

Page 13: Models and languages for semistructured data Bridging documents and databases.

Examples of path expressions

biblio.book.author - authors of booksbiblio.paper.author - authors of papersbiblio.(book l paper).author - authors of

books or papersbiblio._.author - authors of anythingbiblio._*.author - nodes at the ends of

paths starting with biblio, ending with author, and having an arbitrary sequence of labels between

Page 14: Models and languages for semistructured data Bridging documents and databases.

Example of a label pattern

((b l B)ook l (a l A)uthor) (s)? - book, Book, author, Author, books, Books, authors, Authors

Page 15: Models and languages for semistructured data Bridging documents and databases.

An exercise

biblio._*.author.(“[s l S]ection”)

Which ones of the following paths match the path expression above?

1. Biblio.author.Section2. Biblio.cat.rat.hat.author.section3. Biblio.author4. Biblio.cat.author.section.Section

Page 16: Models and languages for semistructured data Bridging documents and databases.

A simple query

Select author: Xfrom biblio.book.author X

Result:{author: “Darwin”, author: “Marx”}

Page 17: Models and languages for semistructured data Bridging documents and databases.

A query with a condition

select row: Xfrom biblio._ Xwhere “Crick” in X.author

Result:{row: {author: “Crick”,

author: “Wallace”,date: 1956,title: “The spiral DNA”}, …}

Page 18: Models and languages for semistructured data Bridging documents and databases.

Two exercises

select row: {title: Y, date: Z}from biblio.paper X, X.title Y, X.date Z

select row: {author: Y, date: Z}from biblio.book X, X.author Y, X.date

Z

Page 19: Models and languages for semistructured data Bridging documents and databases.

A database

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

select row: {title: Y, date: Z}from biblio.paper X, X.title Y, X.date Z

Page 20: Models and languages for semistructured data Bridging documents and databases.

A database

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

author

titledate

n3

Marx Kapital 1860book

…….

Page 21: Models and languages for semistructured data Bridging documents and databases.

Nested queries

select row: (select author: Y from X.author Y)

from biblio.book X

Page 22: Models and languages for semistructured data Bridging documents and databases.

Three exercises

Which authors have written a book or a paper in 1992?

Which authors have written a book together with Jones?

Which authors have written both a book and a paper?

Page 23: Models and languages for semistructured data Bridging documents and databases.

Expressing relations

a b c

1 2 33 2 24 3 1

b d e

1 1 33 4 22 3 1

r1 r2

{ r1: { row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2} }, r2: { row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2} } }

Page 24: Models and languages for semistructured data Bridging documents and databases.

Expressing relational joins

select a: A, d: Dfrom r1.row X

r2.row YX.a A, X.b B, Y.b B’, Y.d D

where B = B’

Page 25: Models and languages for semistructured data Bridging documents and databases.

Label variables

select L: Xfrom biblio._*.L Xwhere matches(“.*Shakespeare.*”, X)

Label variable

biblio book

author

titledate

n2

Shakespeare Macbeth 1622

db

author

titledate

n3

Smith Best of Shakespeare 1992book

…….

Page 26: Models and languages for semistructured data Bridging documents and databases.

Label variables

select L: Xfrom biblio._*.L Xwhere matches(“.*Shakespeare.*”, X)

{author: “Shakespeare”, title: “Best of Shakespeare”}

Page 27: Models and languages for semistructured data Bridging documents and databases.

Turning labels into data

select publ: {type: L, author: A}

from biblio.L X, X.author A

biblio

paper

book

author

author

title

date

Crick

Wallace

DNAspiral

1956

author

titledate

n1

n2

Darwin Origin 1848

db

{publ: {type: “paper”, author: “Crick”},publ: {type: “paper”, author: “Wallace”},publ: {type: “book”, author: “Darwin”}

Page 28: Models and languages for semistructured data Bridging documents and databases.

An exercise

List all publications in 1992, their types, and titles.

Page 29: Models and languages for semistructured data Bridging documents and databases.

Basic XML syntax

XML is a textual representation of dataAn element is a text bounded by tags

<name> John </name>

start-tagend-tagcontent

element

<name> </name> can be abbreviated as <name/>

Page 30: Models and languages for semistructured data Bridging documents and databases.

Basic XML syntax

Elements may contain subelements

<person><name> John </name><tel> 112233 </tel><email> [email protected] </email>

</person>

Page 31: Models and languages for semistructured data Bridging documents and databases.

XML attributes

An attribute is defined by a name-value pair within a tag

<price currency = “dollar”> 500 </price>

<length unit = “cm”> 25 </length>

Page 32: Models and languages for semistructured data Bridging documents and databases.

XML attributes and elements

<product><name> widget </name><price> 10 </price>

</product>

<product price = “10”><name> widget </name>

</product>

<product name = “widget” price = “10”/>

Page 33: Models and languages for semistructured data Bridging documents and databases.

XML and ssd-expressions

<person><name> John </name><tel> 112233 </tel><email> [email protected] </email>

</person>

{person: {name: “John”, tel: 112233, email: “[email protected]”}}

Page 34: Models and languages for semistructured data Bridging documents and databases.

XML references

<person id = “p1”><name> John </name><tel> 112233 </tel>

</person>

<person id = “p2”><name> Peter </name><tel> 998877 </tel><boss idref = “p1”/>

</person>

element identifier

reference attribute

Page 35: Models and languages for semistructured data Bridging documents and databases.

Document Type Definitions

<!DOCTYPE db [<!ELEMENT db (person*)><!ELEMENT person (name, age, email)><!ELEMENT name (#PCDATA)><!ELEMENT age (#PCDATA)><!ELEMENT email (#PCDATA)>

]>

Page 36: Models and languages for semistructured data Bridging documents and databases.

An exercise on DTDs as schemas

<db> <r1> <a> a1 </a> <b> b1 </b> </r1><r1> <a> a2 </a> <b> b2 </b> </r1> <r2> <c> a1 </c> <d> b1 </d> </r1> <r2> <c> c2 </c> <d> d2 </d> </r1> <r3> <a> a1 </a> <c> b1 </c> </r1>

</db>

Write down a DTD for the data above!

Page 37: Models and languages for semistructured data Bridging documents and databases.

Attributes in DTDs

<product>

<name language = “Swedish” department = “music”>

trumpet </name>

<price currency = “dollar”> 500 </price>

<length unit = “cm”> 25 </length>

</product>

<!ATTLIST name language CDATA #REQUIRED department CDATA #IMPLIED>

<!ATTLIST price currency CDATA #REQUIRED><!ATTLIST length unit CDATA #REQUIRED>

Page 38: Models and languages for semistructured data Bridging documents and databases.

Reference attributes in DTDs

<!DOCTYPE people [

<!ELEMENT people (person*)>

<!ELEMENT person (name)>

<!ELEMENT name (PCDATA)>

<!ATTLIST person id ID #REQUIRED

boss IDREF #REQUIRED

friends IDREFS#IMPLIED>

]>

Page 39: Models and languages for semistructured data Bridging documents and databases.

An exercise

<people><person> id = “sven” boss = “olle”>

<name> Sven Svensson </name></person> <person> id = “olle” friends = “nils eva”>

<name> Olle Olsson </name></person> <person> id = “pelle” boss = “nils eva”>

<name> Per Persson </name></person>

<people>

Does this XML element conform to the previous DTD?

Page 40: Models and languages for semistructured data Bridging documents and databases.

Limitations of DTDs as schemas

DTDs impose order

No base types

The types of IDREFs cannot be

constrained

Page 41: Models and languages for semistructured data Bridging documents and databases.

XSL - extensible stylesheet language<bib> <book> <title> t1 </title>

<author> a1 </author> <author> a2 </author>

</book><paper>

<title> t2 </title> <author> a3 </author> <author> a4 </author>

</paper> <book> <title> t3 </title>

<author> a5 </author> <author> a6 </author>

</book></bib>

Page 42: Models and languages for semistructured data Bridging documents and databases.

Template rules and XSL patterns

<xsl: template><xsl: apply-templates/>

</xsl: template>

<xsl: template match = “bib/*/title”><result>

<xsl: value-of/></result>

</xsl: template>

}Template rule

XSL pattern

<result> t1 </result><result> t2 </result><result> t3 </result>

Page 43: Models and languages for semistructured data Bridging documents and databases.

Two exercises

select row: {title: Y, date: Z}from biblio.paper X, X.title Y, X.date Z{row: {title: “The spiral DNA”,

date: 1956}, {title: “Origin”,date: 1848}, {title: “Kapital”,date: 1860}}

select row: {author: Y, date: Z}from biblio.book X, X.author Y, X.date Z

Page 44: Models and languages for semistructured data Bridging documents and databases.

Which authors have written a book or a paper in 1992?

select author: Xfrom biblio.(book | paper) Y, Y.author Xwhere Y.date = 1992

Page 45: Models and languages for semistructured data Bridging documents and databases.

Which authors have written a book together with Jones?

select author: Xfrom biblio.book Y, Y.author Xwhere “Jones” in Y.author

Page 46: Models and languages for semistructured data Bridging documents and databases.

Which authors have written both a book and a paper?

select author: Afrom biblio.book B, biblio.paper P, B.author Awhere B.author = P.author

select author: A1from biblio.book B, biblio.paper P, B.author A1, P.author A2where A1 = A2

Page 47: Models and languages for semistructured data Bridging documents and databases.

List all publications in 1992, their types, and titles.

select publ: {type: L, title: T}from biblio.L X, X.title Twhere X.date = 1992

Page 48: Models and languages for semistructured data Bridging documents and databases.

<!DOCTYPE db [<!ELEMENT db (r1*, r2*, r3*)><!ELEMENT r1 (a, b)><!ELEMENT r2 (c, d)><!ELEMENT r3 (a, c)><!ELEMENT a (#PCDATA)><!ELEMENT b (#PCDATA)><!ELEMENT c (#PCDATA)><!ELEMENT d (#PCDATA)>

]>

<db> <r1> <a> a1 </a> <b> b1 </b> </r1><r1> <a> a2 </a> <b> b2 </b> </r1> <r2> <c> a1 </c> <d> b1 </d> </r1> <r2> <c> c2 </c> <d> d2 </d> </r1> <r3> <a> a1 </a> <c> b1 </c> </r1>

</db>