Top Banner
1 Lecture 10: Database Design XML Wednesday, October 20, 2004
36

1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

1

Lecture 10:Database Design

XML

Wednesday, October 20, 2004

Page 2: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

2

Outline

• Design of a Relational schema (3.6)

• XML

Page 3: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

3

Normal Forms

First Normal Form = all attributes are atomic

Second Normal Form (2NF) = old and obsolete

Third Normal Form (3NF) = this lecture

Boyce Codd Normal Form (BCNF) = this lecture

Others...

Page 4: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

4

Boyce-Codd Normal Form

A simple condition for removing anomalies from relations:

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.

A relation R is in BCNF if:

If A1, ..., An B is a non-trivial dependency

in R , then {A1, ..., An} is a key for R

A relation R is in BCNF if:

If A1, ..., An B is a non-trivial dependency

in R , then {A1, ..., An} is a key for R

Page 5: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

5

BCNF Decomposition Algorithm

A’s OthersB’s

R1

Is there a 2-attribute relation that isnot in BCNF ?

Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2

Until no more violations

Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2

Until no more violations

R2

In practice, we havea better algorithm (next):

Page 6: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

6

BCNF Decomposition Algorithm

BCNF_Decompose(R) find X s.t.: X ≠X+ ≠ [all attributes] if (not found) then “R is in BCNF” else let Y = X+ - X let Z = [all attributes] - X+ decompose into R1(X Y) and R2(X Z) BCNF_Decompose(R1) BCNF_Decompose(R2)

BCNF_Decompose(R) find X s.t.: X ≠X+ ≠ [all attributes] if (not found) then “R is in BCNF” else let Y = X+ - X let Z = [all attributes] - X+ decompose into R1(X Y) and R2(X Z) BCNF_Decompose(R1) BCNF_Decompose(R2)

Page 7: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

7

Example BCNF DecompositionPerson(name, SSN, age, hairColor, phoneNumber)

SSN name, ageage hairColor

Iteration 1: PersonSSN+ = SSN, name, age, hairColorDecompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber)

Iteration 2: Page+ = age, hairColorDecompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber)

Iteration 1: PersonSSN+ = SSN, name, age, hairColorDecompose into: P(SSN, name, age, hairColor) Phone(SSN, phoneNumber)

Iteration 2: Page+ = age, hairColorDecompose: People(SSN, name, age) Hair(age, hairColor) Phone(SSN, phoneNumber)

Find X s.t.: X ≠X+ ≠ [all attributes]

What isthe key ?

Page 8: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

8

Other Example

• R(A,B,C,D) A B, B C

• Iteration 1: X = A: A+= ABC– split R into R1(A,B,C) R2(A,D)

• Iteration 2: X = B: B+=BC– Split R into R3(B,C), R4(A,B), R2(A,D)

• What happens if at iteration 1 we pick X = AB ?

What isthe key ?

Page 9: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

9

3NF: A Problem with BCNF

Unit CompanyCompany, Product Unit

Unit CompanyCompany, Product Unit

Unit+ = Unit, Company

We loose the FD: Company, Product Unit !!

Unit Company Product

Unit Company Unit Product

Unit CompanyUnit Company

Page 10: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

10

So What’s the Problem?

No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again:

Unit Company

Galaga99 UW

Bingo UW

Unit Product

Galaga99 Databases

Bingo Databases

Unit Company Product

Galaga99 UW Databases

Bingo UW Databases

Unit CompanyUnit Company

Company, Product UnitCompany, Product UnitViolates the FD:

Page 11: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

11

The Problem

• We started with a table R and FD

• We decomposed R into BCNF tables R1, R2, …with their own FD1, FD2, …

• We can reconstruct R from R1, R2, …

• But we cannot reconstruct FD from FD1, FD2, …

Page 12: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

12

Solution: 3rd Normal Form (3NF)

A simple condition for removing anomalies from relations:

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } a super-key for R, or B is part of a key.

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } a super-key for R, or B is part of a key.

Tradeoff:BCNF = no anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies

Page 13: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

13

3NF Decomposition Algorithm

3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X+ - X - K ≠ and X+ ≠ [all attributes] if (not found) then “R is in 3NF” else let Y = X+ - X - K let Z = [all attributes] - (X Y) decompose into R1(X Y) and R2(X Z) 3NF_Decompose(R1) 3NF_Decompose(R2)

3NF_Decompose(R) let K = [all attributes that are part of some key] find X s.t.: X+ - X - K ≠ and X+ ≠ [all attributes] if (not found) then “R is in 3NF” else let Y = X+ - X - K let Z = [all attributes] - (X Y) decompose into R1(X Y) and R2(X Z) 3NF_Decompose(R1) 3NF_Decompose(R2)

Page 14: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

14

Example of 3NF decompositionR(A,B,C,D,E):R(A,B,C,D,E):

AB CC DD BD E

AB CC DD BD E

Keys: (need to compute X+, for several Xs) AB, AC, AD

K = {A, B, C, D}

Pick X = CC+ = BCDEC BDE is a BCNF violationFor 3NF: remove B, D (part of K):C E is a 3NF violationDecompose: R1(C, E), R2(A,B,C,D)

R1 is in 3NFR2 is in 3NF (because its keys: AB, AC, AD)

Page 15: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

15BCNF

3NF v.s. BCNF DecompositionA B C D E F G H K

A B C D E E F G H K

E F G G H KA B C C D E

A B A B A B A B A B A B A BA B

3NF

Page 16: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

16

XML Outline

• XML (4.6, 4.7)– This lecture: syntax, semistructured data– Next lectures: DTDs, XPath, XQuery

Page 17: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

17

Additional Readings on XML

• XQuery from the Experts, Katz, Ed. – The reference on Xquery

• http://www.w3.org/XML/1999/XML-in-10-points• www.zvon.org/xxl/XMLTutorial/General/book_en

.html• http://db.bell-labs.com/galax/• Main source: www.w3.org (but hard to read)

Page 18: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

18

XML

• eXtensible Markup Language

• XML 1.0 – a recommendation from W3C, 1998

• Roots: SGML (a very nasty language).

• After the roots: a format for sharing data

Page 19: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

19

XML Data

• Relational data does not have a syntax– I can’t “give” you my relational database– Need to import it from other other syntax, like CSV (comma-

separated-values)

• XML = rich syntax for data– But XML is not relational: semistructured

• Usage:– Map any data to XML– Store it in files, exchange on the Web, etc.– Even query it directly, using XPath, XQuery

Page 20: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

20

From HTML to XML

HTML describes the presentation

Page 21: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

21

HTML

<h1> Bibliography </h1>

<p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu

<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>

Abiteoul, Buneman, Suciu

<br> Morgan Kaufmann, 1999

<h1> Bibliography </h1>

<p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu

<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>

Abiteoul, Buneman, Suciu

<br> Morgan Kaufmann, 1999

Page 22: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

22

XML<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>

<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>

XML describes the content

Page 23: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

23

XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• elements:

<book>…</book>,<author>…</author>• elements are nested• empty element: <red></red> abbrv. <red/>• an XML document: single root element

well formed XML document: if it has matching tags

Page 24: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

24

More XML: Attributes

<book price = “55” currency = “USD”>

<title> Foundations of Databases </title>

<author> Abiteboul </author>

<year> 1995 </year>

</book>

<book price = “55” currency = “USD”>

<title> Foundations of Databases </title>

<author> Abiteboul </author>

<year> 1995 </year>

</book>attributes are alternative ways to represent data

Page 25: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

25

More XML: Oids and References

<person id=“o555”> <name> Jane </name> </person>

<person id=“o456”> <name> Mary </name>

<children idref=“o123 o555”/>

</person>

<person id=“o123” mother=“o456”><name>John</name>

</person>

<person id=“o555”> <name> Jane </name> </person>

<person id=“o456”> <name> Mary </name>

<children idref=“o123 o555”/>

</person>

<person id=“o123” mother=“o456”><name>John</name>

</person>

oids and references in XML are just syntax

Page 26: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

26

More XML: CDATA Section

• Syntax: <![CDATA[ .....any text here...]]>

• Example:

<example> <![CDATA[ some text here </notAtag> <>]]></example>

<example> <![CDATA[ some text here </notAtag> <>]]></example>

Page 27: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

27

More XML: Entity References

• Syntax: &entityname;

• Example: <element> this is less than &lt; </element>

• Some entities: &lt; <

&gt; >

&amp; &

&apos; ‘

&quot; “

&#38; Unicode char

Page 28: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

28

More XML: Processing Instructions

• Syntax: <?target argument?>

• Example:

• What do they mean ?

•<product> <name> Alarm Clock </name> <?ringBell 20?> <price> 19.99 </price></product>

•<product> <name> Alarm Clock </name> <?ringBell 20?> <price> 19.99 </price></product>

Page 29: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

29

More XML: Comments

• Syntax <!-- .... Comment text... -->

• Yes, they are part of the data model !!!

Page 30: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

30

XML Namespaces

• http://www.w3.org/TR/REC-xml-names (1/99)

• name ::= [prefix:]localpart

<book xmlns:isbn=“www.isbn-org.org/def”>

<title> … </title>

<number> 15 </number>

<isbn:number> …. </isbn:number>

</book>

<book xmlns:isbn=“www.isbn-org.org/def”>

<title> … </title>

<number> 15 </number>

<isbn:number> …. </isbn:number>

</book>

Page 31: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

31

<tag xmlns:mystyle = “http://…”>

<mystyle:title> … </mystyle:title>

<mystyle:number> …

</tag>

<tag xmlns:mystyle = “http://…”>

<mystyle:title> … </mystyle:title>

<mystyle:number> …

</tag>

XML Namespaces

• syntactic: <number> , <isbn:number>

• semantic: provide URL for schema

Belong to this namespace

Page 32: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

32

From Relational Data to XML Data

<persons><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone>

6363</phone></row></persons>

<persons><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone>

6363</phone></row></persons>

row row row

name name namephone phone phone

“John” 3634 “Sue” “Dick”6343 6363Persons

XML: persons

Name Phone

John 3634

Sue 6343

Dick 6363

Page 33: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

33

XML Data

• XML is self-describing

• Schema elements become part of the data– Reational schema: persons(name,phone)– In XML <persons>, <name>, <phone> are part

of the data, and are repeated many times

• Consequence: XML is much more flexible

• XML = semistructured data

Page 34: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

34

Semi-structured Data Explained

• Missing attributes:

• Could represent ina table with nulls

<person> <name> John</name> <phone>1234</phone> </person>

<person> <name>Joe</name></person>

<person> <name> John</name> <phone>1234</phone> </person>

<person> <name>Joe</name></person> no phone !

name phone

John 1234

Joe -

Page 35: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

35

Semi-structured Data Explained

• Repeated attributes

• Impossible in tables:

<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>

<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>

two phones !

name phone

Mary 2345 3456 ???

Page 36: 1 Lecture 10: Database Design XML Wednesday, October 20, 2004.

36

Semistructured Data Explained

• Attributes with different types in different objects

• Nested collections (no 1NF)• Heterogeneous collections:

– <db> contains both <book>s and <publisher>s

<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>

<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>

structured name !