Introduction to XPath Kristian Torp Department of Computer Science Aalborg University people.cs.aau.dk/˜torp [email protected] November 3, 2015 daisy.aau.dk Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 1 / 59
Introduction to XPath
Kristian Torp
Department of Computer ScienceAalborg University
people.cs.aau.dk/˜[email protected]
November 3, 2015
daisy.aau.dk
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 1 / 59
Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 2 / 59
Learning Goals and Focus
Learning GoalsUnderstand the XPath data model
Know the basic tree terminology
Good at querying XML documents using XPathKnow the abbreviations used in XPath
Very handy to know in practiceCompact and quite readable!
Database FocusAll XML technologies are presented from a database perspective also
called a data focus (i.e., not a document focus)!
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 3 / 59
Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 4 / 59
Introduction
ExampleFind all courses: /coursecatalog/course
Find the semesters: //semester/text()
OverviewA language for
finding/addressing information in XML documentsnavigating through elements and attributes in an XML document
Used in many XML technologies, e.g., XQuery and XPointerA part of the XSLT recommendation
Microsoft/Visual Studio makes heavy usage of XSLT
The data model is an abstract and logical structure of an XMLdocument
Called a node tree
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 5 / 59
The Node Tree
TerminologyDocument node: The entire XML document
Also called the document root or the root nodeElement node: An XML element
A special one is the document element or root element
Text node: The text strings in an element nodeAttribute node: An attribute
Example (A Node Tree)/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 6 / 59
Example: Find the CoursesExample (Document)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Query/coursecatalog/course
Result
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 7 / 59
Example: Find the Semesters
Example (Document)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Query//semester/text()
Result
3 7
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 8 / 59
Major Components
ComponentsNodes
XML document treated as a tree of nodesExamples: Elements, attributes, and comments
Path expressionsSelect a set of nodes in an XML documentExamples: /, /coursecatalog/course
Standard functionsApproximate 100 built-in functionsExamples: concat(’a’, ’b’), round(1.5)
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 9 / 59
Quiz
Example (Document)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
QuestionsWho is the parent of the document element?
How many document elements are there in an XML document?
How many elements can there be in an XML document?
Are elements and attributes the same node type?Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 10 / 59
Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 11 / 59
Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
TreeLike any other tree in CS
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Root
QuizAre there other roots?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Leafs
QuizAre there more leafs?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Children of 1
QuizWho are the children of 3?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Siblings of 9
QuizWho are the siblings of 3?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Ancestors of 6
QuizWho are the ancestors of 9?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Parent of 8
QuizWho are the parents of 4?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
Tree Terminology
Example (A Node Tree)
1
2
3 4
5
6
7
8
9 A B
Descendants of 1
QuizWho are the descendants of 5?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 12 / 59
QuizExample (Another Node Tree)
1
2
3 4
5 6 7
8
9
A
B
C
D E F
G
H I
J
QuestionsParent of E?
Children of 2?
Descendants of 2?Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 13 / 59
Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 14 / 59
Location Path and Location Step I
Definition (Location Path)A location path evaluates to a sequence of nodes
Example (Location Path)/child::coursecatalog/child::course[name=’OOP’or name=’DB’][@id<10]
Definition (Location Step)A location path consists of a number of location steps.
Example (Location Steps)child::coursecatalog
child::course[name=’OOP’or name=’DB’][@id<10]
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 15 / 59
Location Path and Location Step II
DefinitionA location step consists of an axis, a node test, and a set of predicates
Example (One)child::coursecatalog
Axis: childNode test: coursecatalogPredicates: empty
Example (Two)child::course[name=’OOP’or name=’DB’][@id<10]
Axis: childNode test: coursePredicates: [name=’OOP’or name=’DB’][@id<10]
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 16 / 59
Abbreviations
Most UsedAbbreviation Meaning
. self::node()
.. parent::node()
//coursecatalog /descendant-or-self::coursecatalog
course child::course
Example (Abbreviations in Action)
Abbreviation Meaning
//name /descendant-or-self::name
//name/.. /descendant-or-self::name/parent::node()
/coursecatalog/course /child::coursecatalog/child::course
NoteAbbreviations makes the expression more readableSometimes abbreviations can make it hard to guess the result
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 17 / 59
Evaluation of Location Path I
Example XML Document
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Evalute the Location Path/child::coursecatalog/child::course[name=’OOP’or name=’DB’][@id<10]/name
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 18 / 59
Evaluation of Location Path II
The Steps in the Evaluation1 Starts with / therefore the context node is set to root node2 Evaluate the location step child::coursecatalog3 Result is the coursecatalog root element node4 Set context to root element node5 Evaluate the location stepchild::course[name=’OOP’or name=’DB’][@id<10]
6 The result is the two course element nodes7 Set context to the OOP course element node8 Evaluate the location step child::name9 Results in the name element node which is the first part of the result
10 Set context to the DB course element node11 Evaluate the location step child::name12 Results in the name element node which is the last part of the resultKristian Torp (Aalborg University) Introduction to XPath November 3, 2015 19 / 59
Context
Definition (Context)A context node (a node in the node tree)
A context size and context position
A set of variable bindings
A function library
A set of name space declaration
Definition (Context Size)The context size is the lenght of the sequence of nodes return by theprevious location step
Definition (Context Position)The context position is the current node in the sequence being evaluated
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 20 / 59
Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 21 / 59
Compact Notation for Node TreeExample (The Node Tree)
/
coursecatalog
course
id=4 name
OOP
semester
3
desc
snip
course
id=2 name
DB
semester
7
desc
snip
Example (The Equivalent Compact Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 22 / 59
Example: Find the Courses
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query/coursecatalog/course
Result
course:OOP
id=4 name:OOP sem:3 dsc
course:DB
id=2 name:DB sem:3 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 23 / 59
Example: Find Elements That Do Not Exist
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query/coursecatalog/name
ResultEmpty no name element below coursecatalog!
Note that it is not an error!
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 24 / 59
Example: Find the Course Names
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query/coursecatalog//name
Result
name:OOP name:DB
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 25 / 59
Examples: Find the OOP Course
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query/coursecatalog/course[name="OOP"]
Result
course:OOP
id=4 name:OOP sem:3 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 26 / 59
Example: Find a Course Based on ID
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query/coursecatalog/course[@id="2"]
Result
course:DB
id=2 name:DB sem:7 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 27 / 59
Example: Filter on an Attribute
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query/coursecatalog/course[@id="2"]/name
Result
name:DB
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 28 / 59
Example: Get the Name of a Course as a String
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query/coursecatalog/course[@id="2"]/name/text()
ResultThe string DB
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 29 / 59
Example: Use Parent Axis
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query//course[@id="2"]/parent::node()
ResultThe document node, i.e., the entire tree
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 30 / 59
Example: Use Child Axis
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query/coursecatalog/child::node()
Result
course:OOP
id=4 name:OOP sem:3 dsc
course:DB
id=2 name:DB sem:7 dsc
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 31 / 59
Example: Use Descendant Axis
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Query/coursecatalog/descendant::node()
Result8 element nodes
6 text nodes
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 32 / 59
Example: Use Functions
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queryconcat("hello, ", "world!")
ResultThe string ’hello, world!’
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 33 / 59
Example: Functions and XPath Expressions
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queryconcat("hello ", /coursecatalog/course[@id="2"]/name/text())
ResultThe string ’hello DB’
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 34 / 59
Most used Path Expressions
Often Used Expressions
Path Expression Description
/ select from the root node//NodeName select NodeName element nodes. select the current node.. select parent of the current node/NodeName[@id>7] select based on attribute node/NodeName[Node2=’H’] select based on element node/NodeName/text() select the text node value/NodeName/attribute() select the attribute nodes/NodeName[1] select the first NodeName element node/NodeName[last()] select the last NodeName element node
NoteAlmost like Linux/Unix directory navigation
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 35 / 59
Quiz
Example (Node Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Questions/coursecatalog/course/name returns?
/coursecatalog/teacher returns?
/coursecatalog is the same as /?
/coursecatalog/course/../course/../course returns?
/coursecatalog/course[@id<11]/name/text() returns?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 36 / 59
Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 37 / 59
Node Numbering
Example (Node Tree)
1
2
3 4
5
6
7
8 9
10 11
12 13 14
NoteDepth-first numbering of nodes
Used for relative access to other nodes
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 38 / 59
Forward and Backward Axes
Definition (Axis)An axis is a sequence of nodes located relative to the context node.
Definition (Forward Axis)A forward axis can only return the context node or nodes after in thedocument order.
Definition (Backward Axis)An backward axis can only return the context node or nodes that arebefore in the document order.
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 39 / 59
The Axes
Axis Name Direction Description
attribute forward All my attributesself forward My selfchild forward All my childrendescendant forward All my children, grand children, etc.parent backward My unique parentancestor backward My parent, grand parent, etc.following forward All after me that are not ancestorspreceding backward All before me that are not ancestorsfollowing-sibling forward My “younger” siblingspreceding-sibling backward My “elder” siblingsdescendant-or-self forward My self and all my descendantsancestor-or-self backward My self or my ancestors
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 40 / 59
Child
FindsImmediately descendants to current node.
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur
1 2 3
QuizWhich direction of the child axis (and why)?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 41 / 59
Child ExamplesExample (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries/coursecatalog/child::node()
Result: the two course nodes
/coursecatalog/course/child::node()
Result: six element nodes
/coursecatalog/course/attribute()
Result: two attribute nodes
/coursecatalog/course/semester/child::node()
Result: two text nodesKristian Torp (Aalborg University) Introduction to XPath November 3, 2015 42 / 59
ParentFinds
The one node immediately above
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
1
cur
QuizWhich direction of the parent axis (and why)?
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 43 / 59
Parent ExamplesExample (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries/coursecatalog/course[@id=’2’]/name/parent::node()
Result: the course element node with id = 2
/coursecatalog/course/name/parent::node()
Result: the two course element nodes
/coursecatalog/parent::node()
Result: the document root
/parent::node()
Result: emptyKristian Torp (Aalborg University) Introduction to XPath November 3, 2015 44 / 59
Descendent
FindsChildren all the way down the tree
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur
1
2 3
4 5
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 45 / 59
Descendant ExamplesExample (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries/coursecatalog/descendant::node()
Result: 8 element nodes + 6 text nodes
/coursecatalog/course[name="OOP"]/descendant::node()
Result: 3 element nodes + 3 text nodes
/coursecatalog/course[name="OOP"]/descendant::node()/attribute()
Result: 2 attribute nodes
/coursecatalog/course/name/descendant::node()
Result: two text nodesKristian Torp (Aalborg University) Introduction to XPath November 3, 2015 46 / 59
Ancestor
FindsParents all the way up the tree
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
4
3
2
1
cur
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 47 / 59
Ancestor ExamplesExample (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries/coursecatalog/course[name="DB"]/desc/ancestor::node()[2]
Result: course-catalog element node
/coursecatalog/course/name/ancestor::node()
Result: document node + coursecatalog node + 2 course nodes
/coursecatalog/ancestor::node()
Result: document root
/ancestor::node()
Result: empty
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 48 / 59
Following
FindsAll nodes that follows excluding descendants
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur 1 2
3 4 5
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 49 / 59
Following Examples
Example (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries/coursecatalog/course[@id="4"]/following::node()
Result: 4 element nodes + 3 text nodes
/coursecatalog/course[@id="2"]/following::node()
Result: empty
/coursecatalog/course[@id="4"]/name/text()/following::node()
Result: 6 element nodes and 5 text nodes
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 50 / 59
Preceding
FindsAll preceding nodes excluding ancestors
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
3
2 1
cur
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 51 / 59
Preceding Examples
Example (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries/coursecatalog/course[@id="4"]/semester/text()/preceding::node()
Result: 1 element node + 1 text node, root element is anscestor
/coursecatalog/course/preceding::node()
Result: the OOP course 4 element nodes + 3 text nodes
/coursecatalog/course[name="OOP"]/preceding::node()
Result: empty
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 52 / 59
Following Sibling
FindsAll siblings nodes following
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
cur 1 2 3
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 53 / 59
Following Sibling ExamplesExample (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries/coursecatalog/course/following-sibling::node()
Result: 1 element node (the DB course)
/coursecatalog/course[@id="2"]/following-sibling::node()
Result: empty
/coursecatalog/course/semester/following-sibling::node()
Result: 2 element nodes (descriptions)
/coursecatalog/course/@id/following-sibling::node()
Result: empty
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 54 / 59
Preceding Sibling
FindsAll siblings nodes before
Numbering
1
2
3 4
5
6
7
8 9
10 11
12 13 14
Example
2 1 cur
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 55 / 59
Preceding Sibling ExamplesExample (Document Tree)
/coursecatalog
course
id=4 name:OOP sem:3 dsc
course
id=2 name:DB sem:7 dsc
Queries/coursecatalog/course/preceding-sibling::node()
Result: 1 element node (the OOP course)
/coursecatalog/course[@id="2"]/preceding-sibling::node()
Result: 1 element node (the OOP course)
/coursecatalog/course/semester/preceding-sibling::node()
Result: 2 element nodes (names)
/coursecatalog/course/desc/preceding-sibling::node()
Result: 4 element nodes (0 attribute nodes)
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 56 / 59
Outline
1 Introduction
2 Tree Terminology
3 Location Path and Steps
4 XPath Path Expressions
5 Axes
6 Summary
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 57 / 59
Summary: XPath
Main PointsXPath is widely used
Not an XML syntax!
XPath is used for many purposes in related XML technologiesXQueryXSLTSQL/XML
W3C Recommendation November 1999 www.w3.org/TR/xpath
NoteVery good idea to get familiar with XPath
XPath is the foundation for understanding other XML technologies
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 58 / 59
Additional Information
Web Siteswww.w3schools.com/XPath/xpath_intro.asp: W3C is always agood place to start
www.stylusstudio.com/w3c/xpath/: A very good and quiteelaborated tutorial
www.devarticles.com/c/a/XML/Introduction-to-XPath/: Good4 page tutorial
pierre.senellart.com/wdmd/chap-xpath.pdf: A description ofthe XPath data model
Toolspgfearo.googlepages.com/: A very good tool for playing aroundwith XPath
There is an introduction screencast
http://www.bit-101.com/xpath/: A good online tool
Kristian Torp (Aalborg University) Introduction to XPath November 3, 2015 59 / 59