More XMLXML schema, XPATH, XSLT
CS 431 – February 21, 2005Carl Lagoze – Cornell University
acknowledgements tohttp://www.w3schools.com/schema/default.asp
xHTML
• HTML “expressed” in XML• Corrects defects in HTML
– All tags closed– Proper nesting– Case sensitive (all tags lower case)– Strict well-formedness
• Defined by a DTD– <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
– <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
xHTML (cont.)
• All new HTML SHOULD be xHTML• W3C validator
– http://validator.w3.org
• Tidy– http://sourceforge.net/projects/jtidy
A little context
Traditional LibraryCentral control
Uniform expertise
Traditional WebDistributed, interlinkedViewable Documents
XMLMarkup Syntax
URIsName Convention
HTTPAccess Method
SchemaType Definition
NamespacesConcept
Integration
XpathData
Decomposition
XSLTData
Transformation
DTDTag Sets
RDFSemantic
Relationships
OWLConceptBuilding
XML Schema Define…
• elements• attributes• Nesting structure (parent/child rela.) • Sibling sequence • Sibling cardinality • Presence or absence of text values • Element and attribute data types • Element and attribute default values
Structure of a schema
• well-formed xml document• elements are in schema namespace• root is <schema> element
Instantiation of a schema
• Note namespaces!!
Simple vs. Complex Values
• Element with complex value contains other elements (has children)
• Element with simple value does not have children (e.g. text).
Simple Value Types
• Restriction on type of content• Syntax
– <xs:element name=“xxx” type=“yyy”/>
• Examples– <xs:element name=“lastname” type=“xs:string”/>– <xs:element name=“age” type=“xs:number”/>– <xs:element name=“age” type=“xs:date”/>
Facets
• Restrictions on values within type context• Examples
String types and patterns
Simple Example
• Memo Schema – http://www.cs.cornell.edu/lagoze/courses/
CS431/2005sp/Examples/Lecture9/memo.xsd
• Instance Document – http://www.cs.cornell.edu/lagoze/courses/
CS431/2005sp/Examples/Lecture9/memo.xml
Complex Types
• Type definition defines elements nesting
Controls on complex types
• sequence – specific order• all – any order• choice – only one
• cardinality – minOccurs, maxOccurs
Complex Type Extension
• Add values to sequence
Mixed Content
Declaring attributes
• Define type– xs:string – xs:decimal – xs:integer – xs:boolean – xs:date – xs:time
• Define optional or required
Use of attributes
• Always a complex type
Type Reuse
Type Reuse Example
• Address schema– http://www.cs.cornell.edu/lagoze/courses/
CS431/2005sp/Examples/Lecture9/address.xsd
• Person schema– http://www.cs.cornell.edu/lagoze/courses/
CS431/2005sp/Examples/Lecture9/person.xsd
• Instance document– http://www.cs.cornell.edu/lagoze/courses/
CS431/2005sp/Examples/Lecture9/person.xml
XPath
• Language for addressing parts of an XML document– XSLT– Xpointer
• Tree model similar to DOM• W3C Recommendation (1999)
– http://www.w3.org/TR/xpath
Xpath Concepts
• Context Node– current node in XML document that is basis of path
evaluation– Default to root
• Location Steps – selection from context node– Axis – sub-tree(s) selection from context node– Node Test – select specific elements or node type(s)– Predicates – predicate for filtering after axis and
node tests
Axis
• child: all children of context• descendent: all children, grandchildren, …• parent: • ancestor
Node Test
• Element name: e.g. “Book”• Wildcard: *• Type(): where type is “node”, “text”, etc.
Predicate
• Boolean and comparative operators• Types
– Numbers– Strings– node-sets
• Functions– Examples
• boolean starts-with(string, string)• number count(node-set)
Combining all into a location set specification
• Syntax: axis::node-test[predicate]• Examples:
– child::Book[position() <= 3] – first three <Book> child elements of context
– child::Book/attribute::color – “color” attributes of <Book> child elements of context
Abbreviations
• Child axis is default– child::Book Book
• Attribute axis @– Book[position() = 1]/@color
• “.” (self), “..” (parent), “//” (descendent-or-self)• position() = n n• Example
– Book[2]/@color
XML Transformations (XSLT)
• Origins: separate rendering from data– Roots in CSS
• W3C Recommendation– http://www.w3.org/TR/xslt
• Generalized notion of transformation for:– Multiple renderings– Structural transformation between different
languages– Dynamic documents
• XSLT – rule-based (declarative) language for transformations
XSLT Capabilities
• Generate constant text• Filter out content• Change tree ordering• Duplicate nodes• Sort nodes• Any computational task (XSLT is “turing
complete”)
XSLT Processing Model
Input XMLdoc
Parsedtree
Xformedtree
Outputdoc
(xml, html, etc)
parse XSLT serialize
XSLT “engine”
XMLinput
XSLT“program”
XSLTEngine
(SAXON)
OutputDocument
(xml, html, …)
Stylesheet Document or Program
• XML document rooted in <stylesheet> element
• Body is set of templates– Xpath expression specifies elements in source tree– Body of template specifies contribution of source
elements to result tree
• Not sequential execution
Template Form
• Elements from xsl namespace are transform instructions
• Match attribute value is xpath expression
• Non-xsl namespace elements are literals.
A simple example
• XML base file– http://www.cs.cornell.edu/Courses/cs502/2002SP/De
mos/xslt/simple.xml
• XSLT file– http://www.cs.cornell.edu/Courses/cs502/2002SP/De
mos/xslt/simple.xsl
XSLT Recursive Programming Style
• Document driven, template matching– Conflict resolution rules– Mode setting
• <xsl:apply-templates mode=“this”>• <xsl:template match=“foo” mode=“this”>• <xsl:template match=“foo” mode=“that”>
– Context setting• <xsl:apply-templates select=“//bar”>
XSLT Procedural Programming
• Sequential programming style• Basics
– for-each – loop through a set of elements– call-template – like a standard procedure call
For-each programming example
• XML base file– http://www.cs.cornell.edu/Courses/cs502/2002SP/De
mos/xslt/foreach.xml
• XSLT file– http://www.cs.cornell.edu/Courses/cs502/2002SP/De
mos/xslt/foreach.xsl
Call-template programming example
• XML base file– http://www.cs.cornell.edu/Courses/cs502/2002SP/
Demos/xslt/call.xml
• XSLT file– http://www.cs.cornell.edu/Courses/cs502/2002SP/De
mos/xslt/call.xsl
Result Tree Creation
• Literals – any element not in xsl namespace• <xsl:text> - content directly to output• <xsl:value-of> - expression processing• <xsl:copy> and <xsl:copyof> - Copy current
node or selected nodes into result tree• <xsl:element> - instantiate an element• <xsl:attribute> - instantiate an attribute
Various other programming constructs
• Conditionals• Variables (declaration and use)• Some type conversion• Sorting
Resources
• XSLT – WROX Press– ISBN 1-861005-06-7
• W3C XSLT Page– http://www.w3.org/Style/XSL/
• Arbortext XSL Tutorial– http://www.nwalsh.com/docs/tutorials/xsl/