Top Banner
XQuery
24

XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

Jul 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

XQuery

Page 2: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 2

XML Example

Root element

Mandatory statement

XML elements

Element names

Element content

Page 3: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 3

Hierarchical StructurePersonList Student

Title Contents

Person Person

Name: John Doe

Id: 111111111

Address

Number: 123

Street: Main St

Name: Joe Public

Id: 666666666

Address

Number: 666

Street: Hollow Rd

Page 4: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 4

Querying XML Documents

• XPath– Simple and efficient– Based on path expression

• XSLT– A full-blown programming language with

powerful query capabilities• XQuery

– SQL-style query language– Has the most powerful and elegant query

capabilities

Page 5: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 5

XPath Data ModelAn attribute is not a child of its parent nodeThat is, if A is an attribute of P, then P is a parent of A, but A is not a child of P

e-children: subelementsa-children: attributest-children: text in an element

Page 6: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 6

XPath• An XPath expression takes a document tree and returns a

set of nodes in the tree• Basic syntax – the UNIX file naming schema

– / – the root node– . – the current node– .. – the parent node– Absolute path expression– Relative path expression

/Students/Student/CrsTaken

Suppose the current node is Name./First and First return the same results

Page 7: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 7

Accessing Attributes and Text• Attributes: use symbol @• Text: use text()• Comment: use comment()/Students/Student/CrsTaken/@CrsCodereturns {CS308, MAT123}

/Students/Student/Name/First/text() returns {John, Bart}

/comment()

Page 8: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 8

Advanced Navigation in XPath• Select specific nodes

– /Students/Student[1]/CrsTaken[2]: the second course taken by thefirst student

– /Students/Student/CrsTaken[last()]: the last course taken by the last student (not all students)

• Wildcard– //: descendant or self

• //CrsTaken: select all CrsTaken elements in the tree• .//CrsTaken: select all descendants of the current node to find the

CrsTaken elements– *: collect all e-children of a node irrespective of type

• Student/*: select all e-children of the Student children of the current node

• /*//*/: select all e-grandchildren of the root and their e-descendants– @*: select all attributes

• CrsTaken/@*: select all attributes of the CrsTaken nodes that sit below the current node

Page 9: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 9

XPath Queries (1)• Select all student nodes where the student has taken a

course in Fall 1994– //Student[CrsTaken/@Semester=“F1994”]– […]: selection condition

• Select elements based on the contents of an element rather than of an attribute– //Student[Status=“Undergraduate” and starts-with(.//Last, “P”) and

not(.//Last=.//First)]• Search for students who have van as part of their name

– //Student[contains(concat(Name//text()), “van”)]• Select the students who have taken at least five courses

– //Student[count(CrsTaken) >= 5]

Page 10: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 10

XPath Queries (2)

• Select all the CrsTaken elements in the document, that occur in Student elements with Status U4 and whose CrsCode attribute has the value CS305– //Student[Status=“U4”]/CrsTaken[@CrsCode=“CS305”]

• Select all Student elements such that the student took MAT123 in fall 1994– //Student[CrsTaken/@CrsCode=“MAT123”][CrsTaken/S

emester=“F1994”]– //Student[CrsTaken/@CrsCode=“MAT123” and

CrsTaken/@Semester=“F1994”]

Page 11: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 11

XPath Queries (3)

• Suppose Grade is an optional attribute of CrsTaken. Select all students who have a CrsTaken element with an explicitly specified Grade attribute (regardless of its value)– //Student[CrsTaken/@Grade]

• A union of elements of different types: the CrsTaken elements pertain to the fall 1994 semester, and the Class elements that describe fall 1994 course offerings– //CrsTaken[@Semester=“F1994”] |

//Class[Semester=“F1994”]

Page 12: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 12

XPointer – A Smarter URL

• Allow the user to concatenate a URL and a path expression

• URL#xpointer(XPathExpr1)xpointer(XPathExpr2)…– The document at URL is found– XPathExpr1 is evaluated against the document

• If a nonempty set of document tree nodes is returned, done

– XPathExpr2 is tried, and so on

Page 13: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 13

XPointer – Example

http://www.foo.edu/report.xml#xpointer(//Student[2])

http://www.foo.edu/Report.xml#xpointer(//Student[CrsTaken/@CrsCode=“MAT123”

and CrsTaken/@Semester=“F1994”])

Page 14: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 14

XQuery

• Integration of the best features of XQL and XML-QL– XQL: an extension of XPath– XML-QL: an SQL-style query language

• XQuery uses XPath as a syntax for its path expressions

Page 15: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 15

Selection

FOR variable declarationsWHERE conditionRETURN result

• FOR clause plays the same role as the FROM clause in SQL

• WHERE clause is borrowed from SQL with the same functionality

• RETURN clause is analogous to SELECT and specifies the templates for the result document

Page 16: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 16

Example Document

Suppose the document is stored at http://xyz.edu/transcripts.xml

Page 17: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 17

A Simple Query• Find all students who have ever taken MAT123

FOR $t IN document(“http://xyz.edu/transcripts.xml”)//TranscriptWHERE $t/CrsTaken/@CrsCode=“MAT123”RETURN $t/Student– $t ranges over all Transcript nodes in the document– Output:<Student StudId=“111111111” Name=“John Doe”/><Student StudId=“123454321” Name=“Joe Blow”/>

• Yield a well-formed XML document by query<StudentList>(

FOR $t IN document(“http://xyz.edu/transcripts.xml”)//TranscriptWHERE $t/CrsTaken/@CrsCode=“MAT123”RETURN $t/Student

)</StudentList>

Page 18: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 18

Reconstructing Using XQuery

• Create a class roster

<CrsTaken CrsCode=“CS305” Semester=“F1995” Grade=“A”/><CrsTaken CrsCode=“CS305” Semester=“F1995” Grade=“C”/>

Page 19: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 19

Joining Two Documentshttp://xyz.edu/classes.xml http://xyz.edu/transcrips.xml

Page 20: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 20

Outer JoinAnother way to generate a class rosterFOR $c IN document(“http://xyz.edu/classes.xml)//ClassRETURN

<ClassRoster CrsCode=$c/@CrsCode Semester=$c/@Semester>$c/CrsName$c/Instructor(

FOR $t IN document(“http://xyz.edu/transcripts.xml”)//TranscriptWHERE $t/CrsTaken/@CrsCode=$c/@CrsCode

AND $t/CrsTaken/@Semester=$c/@SemesterRETURN$t/StudentSORTBY($t/Student/@StudId)

)</ClassRoster>

SORTBY($c/@CrsCode)

An empty class may also be listed!

Page 21: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 21

EquijoinFOR $c IN document(“http://xyz.edu/classes.xml)//ClassWHERE document(“http://xyz.edu/transcripts.xml”)//CrsTaken[@CrsCode=$c/@CrsCode

and @Semester=$c/@SemesterRETURN

<ClassRoster CrsCode=$c/@CrsCode Semester=$c/@Semester>$c/CrsName$c/Instructor(

FOR $t IN document(“http://xyz.edu/transcripts.xml”)//TranscriptWHERE $t/CrsTaken/@CrsCode=$c/@CrsCode

AND $t/CrsTaken/@Semester=$c/@SemesterRETURN$t/StudentSORTBY($t/Student/@StudId)

)</ClassRoster>

SORTBY($c/@CrsCode)

Page 22: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 22

Semantics – FOR Clause

• Specify the documents to be used in the query• Declare variables• Bind each variable to its range

– The range is an ordered set of document nodes specified by an XQuery expression

– An XPath expression, a query or a function that returns a list of nodes

– If variables $a and $b bind with nodes <v, w> and <x, y, z>, respectively, then the ordered list of tuples will be produced: <v, x>, <v, y>, <v, z>, <w, x>, <w, y>, <w, z>

Page 23: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 23

Semantics – WHERE and RETURN

• WHERE clause: select an ordered sublistfrom the original list of tuples by filtering the tuples of bindings

• RETURN clause: apply to every surviving tuple of bindings, and create fragments for the output document

Page 24: XML and Web Data - Simon Fraser University · CMPT 354: Database I -- XQuery 3 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Id: 111111111

CMPT 354: Database I -- XQuery 24

Summary and To-Do-List

• XPath• XQuery• Assignment 3