Top Banner
1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002
38

1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

Jan 19, 2016

Download

Documents

Lester Robinson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

1

Lecture 13: XQueryXML Publishing, XML Storage

Monday, October 28, 2002

Page 2: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

2

OrganizationProject:• Next phase, need to form companies• Please form group of 3, email Tessa by Thursday• Problems, little extra-credit:

– One group will have only two people– One billing, one shipping volunteers to do inventory

Homework: • Good practice for the midterm• Try to finish before Monday

Page 3: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

3

Organization

Midterm

• Next Monday, 11/3

• Missed it ? You will get this score:– MidtermScore = 100 – 1.2(100 – FinalScore)– In other words, you will loose 20% more points

than on the final

Page 4: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

4

Overview

• DTDs: elements and attributes

• XQuery

Page 5: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

5

Very Simple DTD

<!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)>]>

<!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)>]>

Page 6: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

6

Very Simple DTD

<company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ...</company>

<company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ...</company>

Example of valid XML document:

Page 7: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

7

Content Model

• Element content: what we can put in an element (aka content model)

• Content model:– Complex = a regular expression over other elements

– Text-only = #PCDATA

– Empty = EMPTY

– Any = ANY

– Mixed content = (#PCDATA | A | B | C)*• (i.e. very restrictied)

Page 8: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

8

Attributes in DTDs

<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED>

<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED>

<person age=“25”> <name> ....</name> ...</person>

<person age=“25”> <name> ....</name> ...</person>

Page 9: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

9

Attributes in DTDs

<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED

id ID #REQUIRED

manager IDREF #REQUIRED

manages IDREFS #REQUIRED>

<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED

id ID #REQUIRED

manager IDREF #REQUIRED

manages IDREFS #REQUIRED>

<person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <name> ....</name> ...</person>

<person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <name> ....</name> ...</person>

Page 10: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

10

Attributes in DTDs

Types:

• CDATA = string

• ID = key

• IDREF = foreign key

• IDREFS = foreign keys separated by space

• (Monday | Wednesday | Friday) = enumeration

• NMTOKEN = must be a valid XML name

• NMTOKENS = multiple valid XML names

• ENTITY = you don’t want to know this

Page 11: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

11

Attributes in DTDs

Kind:• #REQUIRED• #IMPLIED = optional• value = default value• value #FIXED = the only value allowed

Page 12: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

12

Using DTDs

• Must include in the XML document• Either include the entire DTD:

– <!DOCTYPE rootElement [ ....... ]>

• Or include a reference to it:– <!DOCTYPE rootElement SYSTEM

“http://www.mydtd.org”>

• Or mix the two... (e.g. to override the external definition)

Page 13: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

13

FLWR (“Flower”) Expressions

FOR ...

LET...

WHERE...

RETURN...

FOR ...

LET...

WHERE...

RETURN...

Page 14: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

14

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

Sample Data for Queries (more or less)

Page 15: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

15

FOR-WHERE-RETURN

Find all book titles published after 1995:

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year/text() > 1995

RETURN $x/title

FOR $x IN document("bib.xml")/bib/book

WHERE $x/year/text() > 1995

RETURN $x/title

Result: <title> abc </title> <title> def </title> <title> ghi </title>

Page 16: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

16

FOR-WHERE-RETURN

Equivalently (perhaps more geekish)

FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title

RETURN $x

FOR $x IN document("bib.xml")/bib/book[year/text() > 1995] /title

RETURN $x

And even shorter:

document("bib.xml")/bib/book[year/text() > 1995] /title document("bib.xml")/bib/book[year/text() > 1995] /title

Page 17: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

17

FOR-WHERE-RETURN

• Find all book titles and the year when they were published:

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer>

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title>{ $x/title/text() } </title> <year>{ $x/year/text() } </year> </answer>

Page 18: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

18

FOR-WHERE-RETURN

• Notice the use of “{“ and “}”

• What is the result without them ?

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>

FOR $x IN document("bib.xml")/ bib/bookRETURN <answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>

Page 19: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

19

XQuery: NestingFor each author of a book by Morgan

Kaufmann, list all books she published:

FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/authorRETURN <result> { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } </result>

FOR $b IN document(“bib.xml”)/bib, $a IN $b/book[publisher /text()=“Morgan Kaufmann”]/authorRETURN <result> { $a, FOR $t IN $b/book[author/text()=$a/text()]/title RETURN $t } </result>

In the RETURN clause comma concatenates XML fragments

Page 20: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

20

XQuery

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

<result> <author>Jones</author> <title> abc </title> <title> def </title> </result> <result> <author> Smith </author> <title> ghi </title> </result>

Result:

Page 21: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

21

Aggregates

Find all books with more than 3 authors:

count = a function that countsavg = computes the averagesum = computes the sumdistinct-values = eliminates duplicates

FOR $x IN document("bib.xml")/bib/bookWHERE count($x/author)>3 RETURN $x

FOR $x IN document("bib.xml")/bib/bookWHERE count($x/author)>3 RETURN $x

Page 22: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

22

Aggregates

Same thing:

FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x

FOR $x IN document("bib.xml")/bib/book[count(author)>3] RETURN $x

Page 23: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

23

Aggregates

Print all authors who published more than 3 books – be aware of duplicates !

FOR $b IN document("bib.xml")/bib, $a IN distinct-values($b/book/author/text())WHERE count($b/book[author/text()=$a)>3 RETURN <author> { $a } </author>

FOR $b IN document("bib.xml")/bib, $a IN distinct-values($b/book/author/text())WHERE count($b/book[author/text()=$a)>3 RETURN <author> { $a } </author>

Page 24: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

24

XQuery

Find books whose price is larger than average:

FOR $b in document(“bib.xml”)/bibLET $a:=avg($b/book/price/text())FOR $x in $b/bookWHERE $x/price/text() > $aRETURN $x

FOR $b in document(“bib.xml”)/bibLET $a:=avg($b/book/price/text())FOR $x in $b/bookWHERE $x/price/text() > $aRETURN $x

Page 25: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

25

FOR-WHERE-RETURN

• “Flatten” the authors, i.e. return a list of (author, title) pairs

FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer>

FOR $b IN document("bib.xml")/bib/book, $x IN $b/title/text(), $y IN $b/author/text()RETURN <answer> <title> { $x } </title> <author> { $y } </author> </answer>

Result:<answer> <title> abc </title> <author> efg </author></answer><answer> <title> abc </title> <author> hkj </author></answer>

Page 26: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

26

FOR-WHERE-RETURN

• For each author, return all titles of her/his books

FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

FOR $b IN document("bib.xml")/bib, $x IN $b/book/author/text()RETURN <answer> <author> { $x } </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

What aboutduplicateauthors ?

Result:<answer> <author> efg </author> <title> abc </title> <title> klm </title> . . . .</answer>

Page 27: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

27

FOR-WHERE-RETURN

• Same, but eliminate duplicate authors:

FOR $b IN document("bib.xml")/bibLET $a := distinct-values($b/book/author/text())FOR $x IN $aRETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

FOR $b IN document("bib.xml")/bibLET $a := distinct-values($b/book/author/text())FOR $x IN $aRETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

Page 28: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

28

FOR-WHERE-RETURN

• Same thing:

FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text())RETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

FOR $b IN document("bib.xml")/bib, $x IN distinct-values($b/book/author/text())RETURN <answer> <author> $x </author> { FOR $y IN $b/book[author/text()=$x]/title RETURN $y } </answer>

Page 29: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

29

FOR-WHERE-RETURN

Find book titles by the coauthors of “Database Theory”:

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”], $y IN $b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer>

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”], $y IN $b/book[author/text() = $x/author/text()]RETURN <answer> { $y/title/text() } </answer>

Result: <answer> abc </ answer > < answer > def </ answer > < answer > abc </ answer > < answer > ghk </ answer >

Question:Why do we get duplicates ?

Page 30: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

30

Distinct-values

Same as before, but eliminate duplicates:

Result: <answer> abc </ answer > < answer > def </ answer > < answer > ghk </ answer >

distinct-values = a function that eliminates duplicates

Need to apply to a collectionof text values, not of elements – note how query has changed

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”]/author/text(), $y IN distinct-values($b/book[author/text() = $x] /title/text())

RETURN <answer> { $y } </answer>

FOR $b IN document("bib.xml")/bib, $x IN $b/book[title/text() = “Database Theory”]/author/text(), $y IN distinct-values($b/book[author/text() = $x] /title/text())

RETURN <answer> { $y } </answer>

Page 31: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

31

SQL and XQuery Side-by-sideProduct(pid, name, maker)Company(cid, name, city)

Find all products made in Seattle

SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”

SELECT x.nameFROM Product x, Company yWHERE x.maker=y.cid and y.city=“Seattle”

FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name }

FOR $r in document(“db.xml”)/db, $x in $r/Product/row, $y in $r/Company/rowWHERE $x/maker/text()=$y/cid/text() and $y/city/text() = “Seattle”RETURN { $x/name }

SQL XQuery

FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name }

FOR $y in /db/Company/row[city/text()=“Seattle”], $x in /db/Product/row[maker/text()=$y/cid/text()]RETURN { $x/name }

CoolXQuery

Page 32: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

32

<db> <product> <row> <pid> ??? </pid> <name> ??? </name> <maker> ??? </maker> </row> <row> …. </row> … </product> . . . .</db>

Page 33: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

33

XQuery

• FOR $x in expr -- binds $x to each value in the list expr

• LET $x := expr -- binds $x to the entire list expr– Useful for common subexpressions and for

aggregations

Page 34: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

34

XQuery

$b is a collection of elements, not a single elementcount = a (aggregate) function that returns the number of elms

<big_publishers> { FOR $p IN distinct-values(//publisher/text()) LET $b := /db/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> { $p } </publisher>}</big_publishers>

<big_publishers> { FOR $p IN distinct-values(//publisher/text()) LET $b := /db/book[publisher/text() = $p] WHERE count($b) > 100 RETURN <publisher> { $p } </publisher>}</big_publishers>

Find all publishers that published more than 100 books:

Page 35: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

35

XQuery

Summary:

• FOR-LET-WHERE-RETURN = FLWR

FOR/LET Clauses

WHERE Clause

RETURN Clause

List of tuples

List of tuples

Instance of Xquery data model

Page 36: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

36

FOR v.s. LET

FOR

• Binds node variables iteration

LET

• Binds collection variables one value

Page 37: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

37

FOR v.s. LET

FOR $x IN /bib/bookRETURN <result> { $x } </result>

FOR $x IN /bib/bookRETURN <result> { $x } </result>

Returns: <result> <book>...</book></result> <result> <book>...</book></result> <result> <book>...</book></result> ...

LET $x := /bib/bookRETURN <result> { $x } </result>

LET $x := /bib/bookRETURN <result> { $x } </result>

Returns: <result> <book>...</book> <book>...</book> <book>...</book> ... </result>

Page 38: 1 Lecture 13: XQuery XML Publishing, XML Storage Monday, October 28, 2002.

38

Collections in XQuery

• Ordered and unordered collections– /bib/book/author/text() = an ordered collection: result is

in document order

– distinct-values(/bib/book/author/text()) = an unordered collection: the output order is implementation dependent

• LET $a := /bib/book $a is a collection• $b/author a collection (several authors...)

RETURN <result> { $b/author } </result>RETURN <result> { $b/author } </result>Returns: <result> <author>...</author> <author>...</author> <author>...</author> ...</result>