XML Language Family Detailed Examples

XML Language FamilyDetailed Examples

• Most information contained in these slide comes from: h ttp://www.w3.org, http://www.zvon.org/

• These slides are intended to be used as a tutorial on XML and related technologies

• Slide author:Jürgen Mangler ([email protected])

• This section contains examples on:

XPath,XPointer

XPath is the result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations [XSLT] and XPointer. The primary purpose of XPath is to address parts of an XML document.

• XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values.

• XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax.

• XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.

http://www.w3.org/TR/xpath#XSLT

• In addition to its use for addressing, XPath is also designed to feature a natural subset that can be used for matching (testing whether or not a node matches a pattern); this use of XPath is described in XSLT (next chapter).

• XPath models an XML document as a tree of nodes. There are different types of nodes, including element nodes, attribute nodes and text nodes.

http://www.w3.org/TR/WD-xslt#patterns

The basic XPath syntax is similar to filesystem addressing. If the path starts with the slash / , then it represents an absolute path to the required element.

/AAA/CCCSelect all elements CCC which are children of the root element AAA

<AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA>

/AAASelect the root element AAA

<AAA> <BBB/> <CCC/> <BBB/> <BBB/> <DDD> <BBB/> </DDD> <CCC/> </AAA>

If the path starts with // then all elements in the document, that fulfill the criteria following //, are selected.

//DDD/BBBSelect all elements BBB which are children of DDD

<AAA> <BBB/> <DDD> <BBB/> </DDD> <CCC> <DDD> <BBB/> <BBB/> </DDD> </CCC> </AAA>

//BBBSelect all elements BBB


The star * selects all elements located by the preceeding path

/*/*/*/BBBSelect all elements BBB which have 3 ancestors

<AAA> <CCC> <DDD> <BBB/> </DDD> </CCC> <CCC> <DDD> <BBB/> </DDD> </CCC> </AAA>

/AAA/CCC/DDD/*Select all elements enclosed by elements /AAA/CCC/DDD


The expression in square brackets can further specify an element. A number in the brackets gives the position of the element in the selected set. The function last() selects the last element in the selection.

/papers/paper[last()]Select the last BBB child of element AAA

<papers> <paper author="motschnig"/> <paper author="derntl"/> <paper author="motschnig"/> <paper author="mangler"> </papers>

/papers/paper[1]Select the first BBB child of element AAA

<papers> <paper author="motschnig"> <paper author="derntl"/> <paper author="motschnig"/> <paper author="mangler"/> </papers>

Attributes are specified by @ prefix. //student[@matnr]Select BBB elements which have attribute id

<students> <student matnr="9506264"/> <student matnr="0002843"/> <student name="Hauer"/> <student/> </students>

//@matnrSelect all attributes @matnr

<students> <student matnr="9506264"/> <student matnr="0002843"/> <student name="Hauer"/> <student/> </students>

//student[not(@*)]Select BBB elements without an attribute

<students> <student id="9506264"/> <student id="0002843"/> <student name="Koegler"/> <student/> </students>

//student[@*]Select BBB elements which have any attribute

<students> <student id="9506264"/> <student id="0002843"/> <student name="Hauer"/>

<student/> </students>

Values of attributes can be used as selection criteria. Function normalize-space removes leading and trailing spaces and replaces sequences of whitespace characters by a single space.

//student[normalize-space(@name)='hauer']Select BBB elements which have an attribute name with value bbb, leading and trailing spaces are removed before comparison

<students> <student matnr="9506264"/> <student name=" hauer "/> <student name="hauer"/> </students>

//student[@id='b1']Select BBB elements which have attribute id with value b1

<students> <BBB matnr="9506264"/> <BBB name=" hauer "/> <BBB name="hauer"/> </students>

Function count() counts the number of selected elements

//*[count(*)=3]Select elements which have 3 children

<AAA> <CCC> <BBB/> <BBB/> <BBB/> </CCC> <DDD> <BBB/> </DDD> <EEE> <CCC/> </EEE> </AAA>

//*[count(BBB)=2]Select elements which have two children BBB

<AAA> <CCC> <BBB/> </CCC> <DDD> <BBB/> <BBB/> </DDD> <EEE> <CCC/> <DDD/> </EEE> </AAA>

Several paths can be combined with | separator ("|" stands for "or", like the logical or operator in C).

/AAA/EEE | //DDD/CCC | /AAA | //BBBNumber of combinations is not restricted

<AAA> <BBB/> <CCC/> <DDD> <CCC/> </DDD> <EEE/> </AAA>

AAA/EEE | //BBBSelect all elements BBB and elements EEE which are children of root element AAA

<AAA> <BBB/> <CCC/> <DDD> <CCC/> </DDD> <EEE/> </AAA>

Axes are a sophisticated concept in XML to find out which nodes relate to each other and how.

<parent> <preceding-sibling/> <preceding-sibling/> <node> <descendant/> <descendant/> </node> <following-sibling/> <following-sibling/><parent>

The above example illustrates how axes work. Starting with node an axe would select the equal named nodes. This example is also the base for the next two pages.

parent

nodefollowing-sibling

preceding-sibling

descendant descendant

The following main axes are available:• the child axis contains the children of the context

node• the descendant axis contains the descendants of the

context node; a descendant is a child or a child of a child and so on; thus the descendant axis never contains attribute or namespace nodes

• the parent axis contains the parent of the context node, if there is one

• the following-sibling axis contains all the following siblings of the context node; if the context node is an attribute node or namespace node, the following-sibling axis is empty

• the preceding-sibling axis contains all the preceding siblings of the context node; if the context node is an attribute node or namespace node, the preceding-sibling axis is empty

• (http://www.w3.org/TR/xpath#axes)

http://www.w3.org/TR/xpath#dt-parent

The child axis contains the children of the context node. The child axis is the default axis and it can be omitted. The descendant axis contains the descendants of the context node; a descendant is a child or a child of a child and so on; thus the descendant axis never contains attribute or namespace nodes.

//CCC/descendant::DDDSelect elements DDD which have CCC among its ancestors

<CCC> <DDD> <EEE> </DDD> </EEE> </DDD> </CCC>

/AAAEquivalent of /child::AAA

<AAA> <BBB/> <CCC/> </AAA>

XPointer is intended to be the basis of fragment identifiers only for the text/xml and application/xml media types (they can point only to documents of these types).

Pointing to fragments of remote documents is analogous to the use of anchors in HTML. Roughly: document#xpointer(…)

<link xmlns:xlink="http://www.w3.org/2000/xlink"> xlink:type="simple"> xlink:href="mydocument.xml#xpointer(//AAA/BBB[1])"></link>

If there are forbidden characters in your expression, you must deal with them somehow.When XPointer appears in an XML document, special characters must be escaped according to directions in XML.

<link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple"

xlink:href="test.xml#xpointer(//AAA position() < 2)">Bzw.

xlink:href="test.xml#xpointer(string-range('^(text in'))"></link>

• The characters < or & must be escaped using < and &.

• Any unbalanced parenthesis must be escaped using circumflex (^)

If your elements have an ID-type attribute, you can address them directly using the value of the ID-type attribute. (Don't forget: you must have an attribute defined as an ID type in your DTD!)Using ID-type attributes, you can easily include or jump to parts of documents.The example below selects node with id("b1").

xpointer(id("b1"))

<book> <book id="b1" name="XML">Bad book.</book> <book id="b2" name="JAVA"> Good book. <additional>Makes me sleep like a baby.</additional> </book> <book id="123" name="42">All answers on only one page.</book></book>

The specification defines one full form and one shorthand form (which is an abbreviation of the full one).

<AAA> <BBB myid="b1" bbb="111">Text in the first element BBB.</BBB> <BBB myid="b2" bbb="222"> Text in another element BBB. <DDD ddd="999">Text in more nested element.</DDD> <DDD ddd="888">Text in more nested element.</DDD> <DDD ddd="777">Text in more nested element.</DDD> </BBB> <CCC ccc="123" xxx="321">Again some text in some element.</CCC> </AAA>

• Short Form: /1/2/3• Full Form: xpointer(/*[1]/*[2]/*[3])

A location of type point is defined by a node, called the container node (node that contains the point), and a non-negative integer, called the index.(//AAA, //AAA/BBB are the container nodes, [1], [2] is used if more than one container node of the same name exists)

xpointer(start-point(//AAA))xpointer(start-point(range(//AAA/BBB[1])))

<AAA>▼ <BBB bbb="111"></BBB> <BBB bbb="222"> <DDD ddd="999"></DDD> </BBB> <CCC ccc="123" xxx="321"/> </AAA>

<AAA> <BBB bbb="111"></BBB> <BBB bbb="222"> <DDD ddd="999"></DDD> </BBB>▼ <CCC ccc="123" xxx="321"/> </AAA>

xpointer(end-point(range(//AAA/BBB[2]))) xpointer(start-point(range(//AAA/CCC)))

When the container node of a point is of a node type that cannot have child nodes (such as text nodes, comments, and processing instructions), then the index is an index into the characters of the string-value of the node; such a point is called a character-point.You can use this to write a link that behaves like a search function. It always jumps to the first appearance of a string, e.g. the word "another".

xpointer(start-point(string-range(//*,'another', 2, 0)))

<AAA> <BBB bbb="111">Text in the first element BBB.</BBB> <BBB bbb="222"> Text in a▼nother element BBB. <DDD ddd="999">Text in more nested element.</DDD> </BBB> <CCC ccc="123" xxx="321">Again some text in some element.</CCC></AAA>

The range function returns ranges covering the locations in the argument location-set. For each location x in the argument location-set, a range location representing the covering range of x is added to the result location set.

xpointer(range(//AAA/BBB[2]))

<AAA> <BBB bbb="111"/> <BBB bbb="222"> Text in another element BBB. </BBB> <CCC ccc="123" xxx="321"/></AAA>

The range-inside function returns ranges covering the contents of the locations in the argument location-set.

xpointer(range-inside(//AAA/BBB[2]))

<AAA> <BBB bbb="111"/> <BBB bbb="222"> Text in another element BBB. </BBB> <CCC ccc="123" xxx="321"/></AAA>

For each location x in the argument location-set, end-point adds a location of type point to the result location-set. That point represents the end point of location x.

xpointer(end-point(string-range(//AAA/BBB,'another')))

<AAA> <BBB bbb="111">Text in the first element BBB.</BBB> <BBB bbb="222"> Text in another▼ element BBB. <DDD ddd="999">Text in more nested element.</DDD> </BBB> <CCC ccc="123" xxx="321">Again some text in some element.</CCC> </AAA>

XML Language Family Detailed Examples

Documents