XML and Internet Databases 1
XML and Internet DatabasesXML an nt rn t Data as s
1
Outline
• Background: documents (SGML/HTML) • Background: documents (SGML/HTML) and databases (structured and
i d d ) semistructured data)
• XML Basics and Document Type • XML Basics and Document Type Descriptors
• XML query languages: XPath, XQuery
2
Part I: Background
What’s the difference between the world of documents and information retrieval and
databases and query interfaces?
3
Documents vs DatabasesDocument world
> plenty of small documents> usually static
Database world> a few large databases> usually dynamic> usually static
> implicit structuresection, paragraph, toc,
> usually dynamic
> explicit structure (schema)section, paragraph, toc,
> tagging
> human friendly> records
> machine friendlyy
> contentform/layout, annotation
> machine friendly
> contentschema, data, methods
> Paradigms“Save as”
schema, data, methods
> ParadigmsAtomicity, Concurrency, Isolation, Durability
> meta-dataauthor name, date, subject > meta-data
schema description
4
What to do with themDocuments Database
• editing
i ti
• updating
• printing
• spell-checking• cleaning
spell checking• counting words
• querying• retrieving (IR)
hi
q y g
• searching• composing/transforming
5
HTMLLin f n f p blishin h p t xt n th W ld Wid • Lingua franca for publishing hypertext on the World Wide Web
• HTML is widely used for formatting and structuring Web documentsdocuments.
• Designed to describe how a Web browser should arrange text, images and push-buttons on a page.
• Easy to learn but does not convey structure and meaning of Easy to learn, but does not convey structure and meaning of data in the Web pages.
• Fixed tag set.
<HTML><HEAD><TITLE>Welcome to the XML course</TITLE></HEAD>
Opening tag Text (PCDATA)
<HEAD><TITLE>Welcome to the XML course</TITLE></HEAD><BODY>
<H1>Introduction</H1><IMG SRC=”dragon.jpeg" WIDTH="200" HEIGHT="150” >
</BODY></HTML>
Closing tag “Bachelor” tagAttribute name Attribute value
6
</HTML>
Semistructure data
1. Information integration: important new application that motivates what followsapplication that motivates what follows.
2. Semistructured data: a new data model designed to cope with problems of designed to cope with problems of information integration.
3 XML W b d d h i 3. XML: a new Web standard that is essentially semistructured data.
4. XQUERY: an emerging standard query language for XML data.
7
Information IntegrationProblem: related data exists in many places. They
talk about the same things, but differ in model, g , ff ,schema, conventions (e.g., terminology).
Example: In the real world, every bar has its own database.
• Some may have relations like beer-price; others have an Microsoft Word file from which the menu i i t dis printed.
• Some keep phones of manufacturers but not addressesaddresses.
• Some distinguish beers and ales; others do not.
8
Two approaches
1. Warehousing: Make copies of information at each data source centrallyat each data source centrally.– Reconstruct data daily/weekly/monthly,
but do not try to keep it up to datebut do not try to keep it up-to-date.
2. Mediation: Create a view of all information, but do not make copies.p– Answer queries by sending appropriate
queries to sources.q .
9
userquery result
Warehousing WarehouseWar hous ng Warehouse
Combiner
Wrapper Wrapper
DB1 DB2
10
Mediationltquery result
Mediator
Wrapper Wrapperresult
queryqueryresult
Wrapper Wrapperquery result query result
DB1 DB2
11
Semistructured Data
• A different kind of data model, more suited to information-integration suited to information-integration applications than either relational or OO.
Think of “objects ” but with the type of – Think of objects, but with the type of an object for its own business rather than the business of the class to which than the business of the class to which it belongs.All i f i f l – Allows information from several sources, with related but different properties, to b fit t th i h lbe fit together in one whole.
• Major application: XML documents.
12
Graph Representation of Semistructured DataSemistructured Data
• Nodes = objects.N d d i l d h • Nodes connected in a general rooted graph structure.
• Labels on arcs.• Atomic values on leaf nodes.m f .• Big deal: no restriction on labels
(roughly = attributes)(roughly = attributes).– Zero, one, or many children of a given
label type are all OKlabel type are all OK.
13
XML (Extensible Markup Language)
HTML uses tags for formatting (e.g., “italic”).XML uses tags for semantics (e g “this is an XML uses tags for semantics (e.g., this is an
address”).• Two modes:• Two modes:1. Well-formed XML: A document that obeys the
“nested tags” rule and does not repeat annested tags rule and does not repeat anattribute within a tag is said to be well-formed.It allows you to invent your own tags much likeIt allows you to invent your own tags, much likelabels in semistructured data.
2 Valid XML involves a DTD (Document Type 2. Valid XML involves a DTD (Document Type Definition) that tells the labels and gives a grammar for how they may be nested.
14
g f y y
Well-Formed XML
1. Declaration = <? ... ?> .Normal declaration is– Normal declaration is<? XML VERSION = "1.0" STANDALONE = "yes" ?>?>
– “Standalone” means that there is no DTD specifiedspecified.
2. Root tag surrounds the entire balance of the d tdocument.– <FOO> is balanced by </FOO>, as in HTML.
3. Any balanced structure of tags OK.– Option of tags that don’t require balance
15
Option of tags that don t require balance, like <P> in HTML.
The Structure of XML
• XML consists of tags and text
• Tags come in pairs <date> ...</date>g p
• They must be properly nestedThey must be properly nested<date> <day> ... </day> ... </date> --- good
d t d /d t /d b d<date> <day> ... </date>... </day> --- bad
16
XML text
XML has only one “basic” type -- text.
It is bounded by tags, e.g.<title> The Big Sleep </title><year> 1935 </ year> --- 1935 is still textyea 935 / yea 935 s st ll text
XML text is called PCDATA (for parsedXML text is called PCDATA (for parsedcharacter data). It uses a 16-bit encoding.
17
XML structureXML structure
Nesting tags can be used to express various Nesting tags can be used to express various structures. E.g., A tuple (record) :
<person>M l l At hi /<name> Malcolm Atchison </name>
<tel> (215) 898 4321 </tel>< il> @d l </ il><email> [email protected] </email>
</person>
18
TerminologyThe segment of an XML document between an opening and a corresponding closing tag is opening and a corresponding closing tag is called an element.
<person><name> Malcolm Atchison </name><tel> (215) 898 4321 </tel><tel> (215) 898 4321 </tel><tel> (215) 898 4321 </tel><email> [email protected] </email>
element<email> [email protected] </email>
</person>
lelement a sub element not an elementelement, a sub-elementof
19
XML is tree-likeXML is tree like
person
name emailtel tel email
Malcolm Atchison (215) 898 4321(215) 898 4321 [email protected]
20
A C l t XML D tA Complete XML Document
<?xml version="1.0"?><person><name> Malcolm Atchison </name><name> Malcolm Atchison </name><tel> (215) 898 4321 </tel><email> [email protected] </email>
</person>/p
21
Example
bbarbeer
beerbar
Bud A.B.
prize
name
manfmanfname
M’lob1995 Gold
Bud A.B.awardyear
name
servedAt 1995 GoldservedAt
Joe’s Maple
name addr
22
Joe s Maple
Example
<?XML VERSION = "1.0" STANDALONE = "yes"?>y<BARS>
<BAR><NAME>Joe's Bar</NAME><BAR><NAME>Joe s Bar</NAME><BEER><NAME>Bud</NAME>
<PRICE>2.50</PRICE></BEER><BEER><NAME>Miller</NAME>
<PRICE>3.00</PRICE></BEER></BAR></BAR><BAR> ...
23
</BARS>
Representing relational DBs:p gTwo ways
projects:title budget managedBy
employees:name ssn age
24
Project and Employee relations in XML
Projects and employees are intermixed
<db><project> <employee>
<title> Pattern recognition </title><budget> 10000 </budget><managedBy> Joe
<name> Sandra </name><ssn> 2234 </ssn><age> 35 </age>/ l
g y</managedBy>
</project><employee>
</employee><project>
<title> Auto guided vehicle </title><budget> 70000 </budget><employee>
<name> Joe </name><ssn> 344556 </ssn>
<budget> 70000 </budget><managedBy> Sandra </managedBy>
</project>:<age> 34 < /age>
</employee>
:</db>
25
Project and Employee relations in XML (cont’d)
<db>l
Employees follows projects
<projects><project>
<title> Pattern recognition </title>
<employees><employee>
<name> Joe </name>g /<budget> 10000 </budget><managedBy> Joe </managedBy>
</project>
<ssn> 344556 </ssn><age> 34 </age>
</employee></project><project>
<title> Auto guided vehicles </title>
</employee> <employee>
<name> Sandra </name>
<budget> 70000 </budget><managedBy> Sandra
</managedBy>
<ssn> 2234 </ssn><age>35 </age>
</employee>/ g y</project>
:</projects>
</employee>:<employees>/db
26
</projects> </db>
Project and Employee relations in XML (cont’d)
db
Or without “separator” tags …<db>
<projects> <title> Pattern recognition </title> <employees>
<name> Joe </name>g
<budget> 10000 </budget><managedBy> Joe </managedBy><title> Auto guided vehicles
<name> Joe </name><ssn> 344556 </ssn><age> 34 </age><name> Sandra </name><title> Auto guided vehicles
</title><budget> 70000 </budget>
dB S d
<name> Sandra </name><ssn> 2234 </ssn><age> 35 </age>:<managedBy> Sandra
</managedBy>:
:</employees>
</db>
</projects>
27
AttributesAn (opening) tag may contain attributes. These are typically used to describe the content of an yp y felement.
<entry><word language = “en”> cheese </word><word language = “fr”> fromage </word><word language = fr > fromage </word><word language = “ro”> branza </word><meaning> A food made … </meaning>g / g
</entry>
28
Attributes (cont’d)Another common use for attributes is to express dimension or typeyp
<picture><height dim “cm”> 2400 </height><height dim= “cm”> 2400 </height><width dim= “in”> 96 </width><data encoding = “gif” compression = “zip”><data encoding gif compression zip >
M05-.+C$@02!G96YE<FEC ...</data>
</picture>
29
Using IDs<family>
<person id="jane" mother="mary" father="john"> <name> Jane Doe </name><name> Jane Doe </name>
</person><person id="john" children="jane jack"> p j j j
<name> John Doe </name></person> <person id="mary" children="jane jack"><person id= mary children= jane jack >
<name> Mary Doe </name></person>
<person id="jack" mother=”mary" father="john"> <name> Jack Doe </name>
</person></person></family>
30
An object-oriented schema An object-oriented schema
class Movie class Actorclass Movie
( extent Movies, key title ){
class Actor
( extent Actors, key name ){
attribute string title;
attribute string director;
l h
attribute string name;
relationship set<Movie> acted_In
relationship set<Actor> casts
inverse Actor::acted_In;
attribute int budget;
inverse Movie::casts;
attribute int age;
attribute set<string> directed;attribute int budget;} ;
attribute set<string> directed;} ;
31
An example<db>
<movie id=“m1”><title>Waking Ned Divine</title><title>Waking Ned Divine</title><director>Kirk Jones III</director><cast idrefs=“a1 a3”></cast>
<actor id=“a1”><name>David Kelly</name>
f<budget>100,000</budget> </movie><movie id=“m2”>
<acted_In idrefs=“m1 m3 m78” ></acted_In>
</actor>t id “ 2”movie id m2
<title>Dragonheart</title><director>Rob Cohen</director>< t id f “ 2 9 21”></ t>
<actor id=“a2”><name>Sean Connery</name><acted_In idrefs=“m2 m9 m11”></acted In><cast idrefs=“a2 a9 a21”></cast>
<budget>110,000</budget> </movie>
</acted_In><age>68</age>
</actor><actor id=“a3”>
<movie id=“m3”><title>Moondance</title><director>Dagmar Hirtz</director>
<actor id= a3 ><name>Ian Bannen</name><acted_In idrefs=“m1 m35”></acted In><director>Dagmar Hirtz</director>
<cast idrefs=“a1 a8”></cast><budget>90,000</budget>
</acted_In></actor>:
</db>
32
</movie>:
/db
Part II: Document Type DescriptorsPart II: Document Type Descriptors(DTD)
Imposing structure on XML documentsp g
33
Document Type DescriptorsDocument ype Descr ptors
• Document Type Descriptors (DTDs) impose yp p ( ) pstructure on an XML document.
Th i l ti hi b t DTD • There is some relationship between a DTD and a schema, but it is not close – there is till d f dditi l “t i ” tstill a need for additional “typing” systems.
• The DTD is a syntactic specificationThe DTD is a syntactic specification.
34
Document Type Definitions (DTD)
Essentially a grammar describing the legal nesting of tags.Ess nt a y a grammar scr ng th ga n st ng of tags.• Intention is that DTD’s will be standards for a domain,
used by everyone preparing or using data in that domain.y y p p g g– Example: a DTD for describing protein structure; a
DTD for describing bar menus, etc.
Gross Structure of a DTD:Gross Structure of a DTD:<!DOCTYPE root tag [
<!ELEMENT name (components)><!ELEMENT name (components)>more elements
]>
35
]>
Example: An Address BookExample: An Address Book<person>
<name> MacNiel, John </name>
<greet> Dr. John MacNiel </greet>
Exactly one nameAt most one greeting
<addr>1234 Huron Street </addr>
<addr> Rome, OH 98765 </addr>
As many address lines as needed (in order)<addr> Rome, OH 98765 </addr>
<tel> (321) 786 2543 </tel>
<f > (321) 786 2543 </f >
( )
Mixed telephones d f<fax> (321) 786 2543 </fax>
<tel> (321) 786 2543 </tel>
and faxes
As many<email> [email protected] </email>
</person>
As manyas needed
36
Specifying the structureSpecifying the structure
name t s if name l t• name to specify a name element• greet? to specify an optional g p y p
(0 or 1) greet elements• name greet? to specify a name followed by • name,greet? to specify a name followed by
an optional greet
37
Specifying the structure (cont)Specifying the structure (cont)
add * t s if 0 add ess li s• addr* to specify 0 or more address lines
• tel | fax a tel or a fax element | m
• (tel | fax)* 0 or more repeats of tel or fax
• email* 0 or more email elements
38
A DTD for the address bookA DTD for the address book
<!DOCTYPE addressbook [[<!ELEMENT addressbook (person*)><!ELEMENT personp
(name, greet?, address*, (fax | tel)*, email*)><!ELEMENT name (#PCDATA)>( )<!ELEMENT greet (#PCDATA)><!ELEMENT address (#PCDATA)>( )<!ELEMENT tel (#PCDATA)><!ELEMENT fax (#PCDATA)><!ELEMENT email (#PCDATA)>
]>
39
Two DTDs for the relational DBTwo DTDs for the relational DB
<!DOCTYPE db [<!ELEMENT db (projects,employees)><!ELEMENT projects (project*)><!ELEMENT projects (project*)><!ELEMENT employees (employee*)><!ELEMENT project (title, budget, managedBy)>p j ( , g , g y)<!ELEMENT employee (name, ssn, age)>...
]>]>
40
Summary of XML regular expressionsy g p• Each element name is a tag.
It t th t th t t d • Its components are the tags that appear nested within, in the order specified.A The tag A occurs• A The tag A occurs
• e1,e2 The expression e1 followed by e2* 0 f • e* 0 or more occurrences of e
• e? Optional -- 0 or 1 occurrences1 • e+ 1 or more occurrences
• e1 | e2 either e1 or e2( ) i• (e) grouping
41
Back to the object-oriented schema Back to the object-oriented schema
class Movie class Actorclass Movie
( extent Movies, key title ){
class Actor
( extent Actors, key name ){
attribute string title;
attribute string director;
l h
attribute string name;
relationship set<Movie> acted_In
relationship set<Actor> casts
inverse Actor::acted_In;
attribute int budget;
inverse Movie::casts;
attribute int age;
attribute set<string> directed;attribute int budget;} ;
attribute set<string> directed;} ;
42
Schema dtdSchema.dtd
<!DOCTYPE db [<!ELEMENT db (movie+, actor+)>( , )<!ELEMENT movie (title,director,casts,budget)><!ATTLIST movie id ID #REQUIRED><!ELEMENT title (#PCDATA)><!ELEMENT director (#PCDATA)><!ELEMENT casts EMPTY><!ELEMENT casts EMPTY>
<!ATTLIST casts idrefs IDREFS #REQUIRED><!ELEMENT budget (#PCDATA)>
43
Schema dtd (cont’d)Schema.dtd (cont d)
<!ELEMENT actor (name, acted_In,age?, directed*)><!ATTLIST actor id ID #REQUIRED><!ELEMENT name (#PCDATA)><!ELEMENT acted_In EMPTY>
<!ATTLIST acted In idrefs IDREFS #REQUIRED><!ATTLIST acted_In idrefs IDREFS #REQUIRED><!ELEMENT age (#PCDATA)><!ELEMENT directed (#PCDATA)>
]>
44
Elements of a DTD
An element is a name (its tag) and a parenthesizeddescription of tags within an elementdescription of tags within an element.• Special case: (#PCDATA) after an element name means it
is textis text.Example
<!DOCTYPE Bars [<!DOCTYPE Bars [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME BEER+)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME PRICE)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>
]>
45
]>
Example of (a)<?XML VERSION = "1.0" STANDALONE = "no"?>
<!DOCTYPE Bars [<!DOCTYPE Bars [<!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)><!ELEMENT NAME (#PCDATA)><!ELEMENT BEER (NAME, PRICE)><!ELEMENT PRICE (#PCDATA)>
]]>
<BARS><BAR><NAME>Joe's Bar</NAME>
<BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>
<BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER>
</BAR>
46
</BAR><BAR> ...
</BARS>
Example of (b)Suppose our bars DTD is in file bar.dtd:
<?XML VERSION = "1.0" STANDALONE = "no"?>
<!DOCTYPE Bars SYSTEM "bar.dtd">
<BARS><BAR><NAME>Joe's Bar</NAME>
<BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>
<BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER>
</BAR></BAR><BAR> ...
</BARS>
47
Attribute Lists• Opening tags can have “arguments” that appear within the tag in • Opening tags can have arguments that appear within the tag, in
analogy to constructs like <A HREF = ...> in HTML.• Keyword !ATTLIST introduces a list of attributes and their types
for a given element.
Example:Example:<!ELEMENT BAR (NAME BEER*)><!ATTLIST BAR
type = "sushi"|"sports"|"other">
• Bar objects can have a type and the value of that type is limited to • Bar objects can have a type, and the value of that type is limited to the three strings shown.
• Example of use:<BAR type = "sushi">
. . .</BAR>
48
ID’s and IDREF’s
• ID stands for identifier. No two ID attributes with the same name may have the same value (of type CDATA)name may have the same value (of type CDATA).
• IDREF stands for identifier reference. Every value associated with an IDREF attribute must exist as an ID attribute value.
• These are pointers from one object to another, analogous to NAME = foo and HREF = #foo in HTML.
• Allows the structure of an XML document to be a general graph • Allows the structure of an XML document to be a general graph, rather than just a tree.
• An attribute of type ID can be used to give the object (string b t i d l i t ) i t i id tifibetween opening and closing tags) a unique string identifier.
• An attribute of type IDREF refers to some object by its identifier.• Also IDREFS to allow multiple object references within one
tag. That is, IDREFS specifies several (0 or more) identifiers
49
ExampleLet us include in our Bars document type elements that are the
manufacturers of beers, and have each beer object link, with jan IDREF, to the proper manufacturer object.<!DOCTYPE Bars [
<!ELEMENT BARS (BAR*)><!ELEMENT BARS (BAR*)><!ELEMENT BAR (NAME, BEER+)><!ELEMENT NAME (#PCDATA)>( )<!ELEMENT MANF (ADDR)>
<!ATTLIST MANF (name ID)><!ELEMENT ADDR (#PCDATA)><!ELEMENT BEER (NAME, PRICE)>
<!ATTLIST BEER (manf = IDREF)><!ATTLIST BEER (manf = IDREF)><!ELEMENT PRICE (#PCDATA)>
]>
50
]
Connecting the document with its DTDConnecting the document with its DTD
In line:In line:<?xml version="1.0"?><!DOCTYPE db [<!ELEMENT > ]><!DOCTYPE db [<!ELEMENT ...> … ]><db> ... </db>
Another file:<!DOCTYPE db SYSTEM "schema.dtd">
A URL:<!DOCTYPE db SYSTEM
"http://www.schemaauthority.com/schema.dtd">
51
DTDs v.s Schemas (or Types)• By database (or programming language) standards
DTDs are rather weak specifications. p f– Only one base type -- PCDATA– No useful “abstractions” e.g., sets– IDREFs are untyped. You point to something, but you
don’t know what!– No constraints e g child is inverse of parentNo constraints e.g., child is inverse of parent– No methods– Tag definitions are global
• Some of the XML extensions impose something like a schema or type on an XML document. We’ll
h lsee these later
52
L t f ibiliti f hLots of possibilities for schemas
• XML Schema (under W3C’s spotlight)• XDR (Microsoft’s BizTalk)( )• SOX (Schema for Object-Oriented XML)• Schematron• DSD (AT&T Labs and BRICS)• and more.and more.
53
Some tools• XML Authority
http://www.extensibility.com/tibco/solutions/xmlp y_authority/index.htm
• XML Spy pyhttp://www.xmlspy.com/download.html
54
SummarySummary
• XML is a new data format. Its main virtues are widespread acceptance and the (important) ability to handle semistructured data (data without sch m )schema).
• DTDs provide some useful syntactic constraints on documents As schemas they are weakdocuments. As schemas they are weak.
55
Why a query language? Extracting, Restructuring, Integration BrowsingIntegration, Browsing…
XML-QL http://www.w3.org/TR/NOTE-xml-qlhttp://db.cis.upenn.edu/XML-QL/
XPATH (part of a query language)h 3 /TR/ hhttp:www.w3.org/TR/xpath
XSLThttp://www w3 org/TR/xslthttp://www.w3.org/TR/xslthttp://www.mulberrytech.com/quickref/XSLTquickref.pdf
QUILThttp://www.almaden.ibm.com/cs/people/chamberlin/quilt.htmlhttp://db.cis.upenn.edu/Kweelt/
56
XPath• Reasonably widely adopted -- in XML-Schema and query
languages.• Neither more expressive nor less expressive than regular path
iexpressions• Primary goal = to permit to access some nodes from a given
documentXP th i st t is i ti• XPath main construct : axis navigation
• An XPath path consists of one or more navigation steps, separated by /A i ti st is t i l t: is d t st list f • A navigation step is a triplet: axis + node-test + list of predicates
• Examplesp– /descendant::node()/child::author– /descendant::node()/child::author[parent/attribute::booktitle =
“XML”][2]
• XPath also offers some shortcuts– no axis means child– // /descendant-or-self::node()/
57
// /descendant or self::node()/
XPath- child axis navigationXPath child axis navigation• author is shorthand for child::author. Examples:
– aaa -- all the child nodes labeled aaa (1 3)aaa -- all the child nodes labeled aaa (1,3)– aaa/bbb -- all the bbb grandchildren of aaa children (4)– */bbb all the bbb grandchildren of any child (4,6)g y
context node
aaa
bbb
ccc aaa
aaa bbb ccc
1 2 3
4 5 6 7
– . -- the context node
bbb aaa bbb ccc
– / -- the root node
58
XPath- child axis navigation (cont)XPath child axis navigation (cont)– /doc -- all the doc children of the root– ./aaa -- all the aaa children of the context node ./aaa all the aaa children of the context node
(equivalent to aaa)– text() -- all the text children of the context node
d () ll h hild f h d (i l d – node() -- all the children of the context node (includes text and attribute nodes)
– .. -- parent of the context node.. parent of the context node– .// -- the context node and all its descendants– // -- the root node and all its descendants– //para -- all the para nodes in the document– //text() -- all the text nodes in the document
@font the font attribute node of the context node– @font the font attribute node of the context node
59
Predicates– [2] -- the second child node of the context node– chapter[5] -- the fifth chapter child of the context chapter[5] the fifth chapter child of the context
node– [last()] -- the last child node of the context node[ ast()] the last ch ld node of the context node– chapter[title=“introduction”] -- the chapter children
of the context node that have one or more titlechildren whose string-value is “introduction”
– person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “Joe”Joe
60
Unions of Path Expressions
• employee | consultant -- the union of the employee and consultant nodes that are employee and consultant nodes that are children of the context nodeFor some reason • For some reason person/(employee|consultant) -- is not allowedallowed
• However / d ()[b l ( l | lt t)]person/node()[boolean(employee|consultant)]
is allowed!!
61
Axis navigation• So far, nearly all our expressions have moved us down the by
moving to child nodes. Exceptions were – . -- stay where you are– / go to the root– // all descendants of the root// all descendants of the root– .// all descendants of the context node
• All other expressions have been abbreviations for child::… hild hild i l f ie.g. child::para. child:is an example of an axis
• XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-g gsibling, namespace, parent, preceding, preceding-sibling, self– Some of these (self, parent) describe single nodes, others
describe sequences of nodes.describe sequences of nodes.
62
XPath Navigation Axes
ancestor
following-siblingpreceding-sibling
child
self
followingprecedingattribute
descendant
namespace
XPath abbreviated syntax
(nothing) child::@ tt ib t@ attribute::// /descendant-or-self::node()
self::node(). self::node().// descendant-or-self::node.. parent::node()p ()/ (document root)
Examples of XPath queries• If the Company XML document is stored at the location
www.company.com/info.xml then the first Xpath expression can be written as can be written as
• doc(www.company.com/info.xml)/company• Some examples of Xpath expressions on XML documents
th t f ll th XML h fil C that follow the XML schema file Company are:• /company - returns the company root node and all its descendant
nodes, that is, the wholeXML docukment./ /d• /company/department -
• //employee [employeeSalary gt 70000]/employeeName – returns all employeeName nodes that are direct children of an employee node, such that the employee node has another child element such that the employee node has another child element employeeSalary whose value is gt 70000.
• /company/employee [employeeSalary gt 70000]/employeeName -/ / j / j W k [h 20 0] hild • /company/project/projectWorker [hours ge 20.0] – returns a child
node hours with a value ge 20.0 hours.
65
XQuery• Xpath allows to write expressions that select
nodes from a tree-structured XML document.f• XQuery permits the specification of more general
queries on one or more XML documents.q• The typical form of a query in Xqurey is known as
a FLWR expression.
• FOR <variable bindings to individual nodes (elements)>• LET <variable bindings to collection of nodes (elements)>• WHERE <qualifier conditions>• RETURN <query result specification>
66
Examples for XQuery queries• FOR $x IN
doc(www.company.com/info.xml)//employee [employeeSalary gt 70000]/employeeName//employee [employeeSalary gt 70000]/employeeNameRETURN <res> $x/firstName, $x/lastName </res>
• FOR $x IN/ / /doc(www.company.com/info.xml)/company/employee
WHERE $x/employeeSalary gt 70000RETURN <res> $x/EmployeeName/firstName, $ / / /$x/employeeName/lastName </res>
• FOR $x INdoc(www.company.com/info.xml)/company( p y ) p y
/project [projectNumber = 5]/projectWorker,$y INdoc(www.company.com/info.xml)/company/employee doc(www.company.com/info.xml)/company/employee WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssnRETURN <res> $x/EmployeeName/firstName,
$y/employeeName/lastName, $x/hours </res>
67
$y/employeeName/lastName, $x/hours /res
XQueryEmerging standard for querying XML documents.
Basic form:FOR <variables ranging over sets of elements>WHERE <condition>RETURN <set of elements>;
• Sets of elements described by paths, consisting fof:
1. URL, if necessary.2. Element names forming a path in the
semistructured data graph, e.g., //BAR/NAME =“start at any BAR node and go to a NAME child ”start at any BAR node and go to a NAME child.
3. Ending condition of the form[<condition about subelements @attributes and values>]
68
[<condition about subelements, @attributes, and values>]
ExampleThe file http://www.cse.ucsc.edu/bars.xml:
<?XML VERSION = "1.0" STANDALONE = "no"?><!DOCTYPE Bars SYSTEM "bar dtd"><!DOCTYPE Bars SYSTEM "bar.dtd"><BARS>
<BAR type = "sports">/<NAME>Joe's Bar</NAME>
<BEER><NAME>Bud</NAME><PRICE>2.50</PRICE></BEER>
<BEER><NAME>Miller</NAME><PRICE>3.00</PRICE></BEER>
</BAR><BAR type = "sushi">
<NAME>Homma's</NAME><BEER><NAME>Sapporo</NAME><BEER><NAME>Sapporo</NAME>
<PRICE>4.00</PRICE></BEER></BAR> ...
</BARS>
69
</BARS>
XQUERY Query
• Query: Find the prices charged for Bud by sports barsQu ry F n th pr c s charg for u y sports arsthat serve Miller.
FOR $ba IN document("http://www.cse.ucsc.edu/bars.html")
//BAR[@ " "]//BAR[@type = "sports"],$be IN
$b / [ A " d"]$ba/BEER[NAME = "Bud"]WHERE $ba/BEER/[NAME = "Miller"]RETURN $be/PRICE;RETURN $be/PRICE;
70
Conclusions• XML is a data format for which there are an
increasing number of useful tools forg f f f– Constructing schemas– Programming– Querying
• Although it is likely that a query language will soon m s st d d th is l ss m t emerge as a standard, there is less agreement or
understanding on how to store XML data efficientlyefficiently.
• Many other database issues remain to make it useful for manipulating large amounts of data.f f m p g g m f .
71