Module 3: XML Query and Manipulat Key XML query and manipulation languages include XPath XQuery XSLT SQL/XML c Munindar P. Singh, CSC 513, Spring 2010 p.45 Metaphors for Handling XML: 1 How we conceptualize XML documents determines our approach for handling them Text: an XML document is text Ignore any structure and perform simple pattern matches Tags: an XML document is text interspersed with tags Treat each tag as an “event” during reading a document, as in SAX (Simple API for XML) Construct regular expressions as in screen scraping c Munindar P. Singh, CSC 513, Spring 2010 p.46
28
Embed
Module 3: XML Query and Manipulation · Module 3: XML Query and Manipulation Key XML query and manipulation languages include XPath XQuery XSLT SQL/XML c Munindar P. Singh, CSC 513,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Axes are addressable node sets based on thedocument tree and the current node
Axes facilitate navigation of a treeSeveral are definedMostly straightforward but some of themorder the nodes as the reverse of othersSome captured via special notation
preceding: nodes that precede the start ofthe context node (not ancestors, attributes,namespace nodes)following: nodes that follow the end of thecontext node (not descendants, attributes,namespace nodes)preceding-sibling: preceding nodes thatare children of the same parent, in reversedocument orderfollowing-sibling: following nodes that arechildren of the same parent
ancestor: proper ancestors, i.e., elementnodes (other than the context node) thatcontain the context node, in reversedocument orderdescendant: proper descendantsancestor-or-self: ancestors, including self (ifit matches the next condition)descendant-or-self: descendants, includingself (if it matches the next condition)
Longer syntax: child::SongSome captured via special notation
self::*:child::node(): node() matches all nodespreceding::*descendant::text()ancestor::Songdescendant-or-self::node(), whichabbreviates to //Compare /descendant-or-self::Song[1](first descendant Song) and //Song[1](first Songs (children of their parents))
Enables pointing to specific parts of documentsCombines XPath with URLsURL to get to a document; XPath to walkdown the documentCan be used to formulate queries, e.g.,
Song-URL#xpointer(//Song[@genre="jazz"])The part after # is a fragment identifier
The official query language for XML, now aW3C recommendation, as version 1.0Given a non-XML syntax, easier on thehuman eye than XMLAn XML rendition, XqueryX, is in the works
Pronounced “flower”For: iterative binding of variables over rangeof valuesLet: one shot binding of variables over vectorof valuesWhere (optional)Order by (sort: optional)Return (required)
The where clauseSelects the combinations of bindings that aredesiredBehaves like the where clause in SQL, inessence producing a join based on theCartesian product
The let clauseLike for, introduces one or more variablesLike for, generates possible bindings foreach variableUnlike for, generates the bindings as a list inone shot (no iteration)
The order by clauseSpecifies how the vector of variable bindingsis to be sorted before the return clauseSorting expressions can be nested byseparating them with commas
Variants allow specifyingdescending or ascending (default)empty greatest or empty least toaccommodate empty elementsstable sorts: stable order bycollations: order by $t collationcollation-URI: (obscure, so skip)
The for clause can be enhanced with a positionalvariable
A positional variable captures the position ofthe main variable in the given for clause withrespect to the expression from which themain variable is generated
Introduce a positional variable via the at $varconstruct
A typical useful quantified expression would usevariables that were introduced outside of itsscope
The order of evaluation isimplementation-dependent: enablesoptimizationIf some bindings produce errors, this canmattersome: trivially false if no variable bindingsare found that satisfy itevery: trivially true if no variable bindings arefound
Analogous to Lisp, a general value can betreated as if it were a Boolean
A xs:boolean value maps to itselfAn empty sequence maps to falseA sequence whose first member is a nodemaps to trueA numeric that is 0 or NaN maps to false,else to trueAn empty string maps to false, others to true
Competitors in some ways, butShare a basis in XPathConsequently share the same data modelSame type systems (in the type-sensitiveversions)XSLT got out first and has a sizablefollowing, but XQuery has strong backingamong vendors and researchers
XQuery is geared for querying databasesSupported by major relational DBMSvendors in their XML offeringsSupported by native XML DBMSsOffers superior coverage of processingjoinsIs more logical (like SQL) and potentiallymore optimizable
XSLT is geared for transforming documentsIs functional rather than declarativeBased on template matching
If no pattern is specified, apply recursively onet-children via <xsl:apply-templates/>By default, if no other template matches,recursively apply to et-children of currentnode (ignores attributes) and to root:
1 < x s l : template match = "∗ | / " >< x s l : apply−templates / >
Two subelements built using restrictedapplication of XPath from within XML Schema
Selector: specify a set of objects: this is thescope over which uniqueness appliesField: specify what is unique for eachmember of the above set: this is the identifierwithin the targeted scope
Multiple fields are treated as ordered toproduce a tuple of values for eachmember of the setThe order matters for matching keyref tokey
Basis for parsing XML, which provides anode-labeled tree in its API
Conceptually simple: traverse by requestingelement, its attribute values, and its childrenProcessing program reflects documentstructure, as in recursive descentCan edit documentsInefficient for large documents: parses themfirst entirely even if a tiny part is neededCan validate with respect to a schema
1 DOMParser p = new DOMParser ( ) ;p . parse ( " f i lename " ) ;Document d = p . getDocument ( )Element s = d . getDocumentElement ( ) ;NodeList l = s . getElementsByTagName ( " member " ) ;
6 Element m = ( Element ) l . i tem ( 0 ) ;i n t code = m. g e t A t t r i b u t e ( " code " ) ;NodeList k ids = m. getChildNodes ( ) ;Node k id = k ids . i tem ( 0 ) ;S t r i n g elemName = ( ( Element ) k i d ) . getTagName ( ) ; . . .