Transforming XML XMLNamespaces, XSLT
Mar 28, 2015
Transforming XML
XMLNamespaces, XSLT
XML Namespaces
• Sometimes it is necessary to mix XML elements– Different types of content
– Use of markup to convey meta-information
• Some documents combine markup from different XML languages
• But:– Elements and attributes from different XML languages
may share the same name
– Need to group elements for processing
XML Namespaces
• XML Namespaces is the xml standard for distinguishing xml elements
• Namespaces are represented by attributes
• Elements from the same namespace can be recognised by software as a group
• Unique namespaces are defined by a URI
URL, URN, URI
• URL: a Uniform Resource Locator• specifies the mechanism by which a resource is accessed
• e.g. http://www.comp.rgu.ac.uk/home.html
• URN: a Uniform Resource Name• a unique sequence of characters naming an internet
resource e.g. urn:Turquoise.Inflatable.Walrus• the name has persistence even if the resource becomes
unavailable
• URI– uniform resource identifier
• a URL or a URN (see RFC 2396 at www.ietf.org)
the namespace prefix
• short string representing the namespace URI• distinguishes element and attribute names• defined using an xmlns:prefix attribute
– <xs:schema xmlns:xs= “URI”>
• a prefixed element name is called a qualified name, or QName, or a raw name
• QName syntaxprefix:local_part
example
• SVG and MathML both contain a set element
• Both SVG and MathML can be embedded in XHTML documents
• prefixes svg and mathml are used to distinguish the set elements
<svg:set> distinct from <mathml:set>
example 2:xml with multiple namespaces
<?xml version=“1.0”>
<html xmlns=“http://www.w3.org/1999/xhtml”
xmlns:xlink=“http://www.w3.org/1999/xlink”>
<head><title>Three Namespaces</title></head>
<body>
<h1 align=“center”>Ellipse and Rectangle</h1>
<!-- continued on next slide -->
xlink prefix associated to the xlink namespace everywhere
within the root element
xhtml namespace associated to root html element and all
descendants (no prefix needed)
all (blue) elements are in the xhtml namespace
xml declaration
xml with multiple namespaces<svg xmlns=“http://www.w3.org/2000/svg”
width = “12cm” height = “10cm”>
<ellipse rx=“110” ry=“130”>
<rect
x=“4cm” y=“1cm”
width=“3cm” height=“6cm”>
</svg>
<!-- continued on next slide -->
svg namespace associated to root svg element and all
descendants (no prefix needed)
all (red) elements are in the svg namespace
xml with multiple namespaces
<p xlink:type=“simple”
xlink:href=“ellipses.html”>
More about ellipses </p>
<p xlink:type=“simple”
xlink:href=“rectangles.html”>
More about rectangles </p>
<hr />
<p>Last Modified 7th October 2003</p>
</body></html>
all (blue) elements are in the xhtml namespace
prefixed QNamed attributes (green) are in the xlink
namespace
more on namespaces
• namespace can be defined in the element where it is used or in the root
• namespaces are identified by the URI, not the prefix used in a particular document
• the parser doesn’t look up the URI – it is only there as a unique identifier!
more on namespaces
• namespaces are completely independent of DTDs
• QNames, if used, must be defined as elements in the DTD for them to be valid– parameter entities are used to get round this
• ingenious but awkward kludge
• not required for this module!
• namespaces important in XSLT documents
Introduction to XSLT
what is XSL?• XML & client/server model
– XML sits on server but does not do anything– XSL provides client views of data
• XSL: eXtensible Stylesheet Language– two separate namespaces
• XSL-FO (Formatting Objects)
• XSLT (Transformations)
– X-Path used to navigate XML
• defines rules for transforming a source XML document into a target document
what is XSLT?
• Transforms source tree to results tree by:– Selecting elements– Selecting attributes– Rearranging elements– Sorting elements– Applying conditional tests
• XML/XSTL Similar to HTML/CSS
the XSLT transformation process
XSLTdocument
XSLTprocessor
XMLsource
outputdocument
set oftemplate rules
match elements and replace using template rules
a simple XSLT example
the source
<?xml version = “1.0”?>
<people>
<person born=“1912” died=“1954”>
<name><first_name>Alan</first-name>
<last_name>Turing</last_name>
</name>
<profession>computer scientist</profession>
<profession>mathematician</profession>
<profession>cryptographer</profession>
</person>
<!-- continued on next page -->
<!-- continued from previous page -->
<person born=“1918” died=“1988”>
<name><first_name>Richard</first-name>
<middle_initial>P</middle_initial>
<last_name>Feynman</last_name>
</name>
<profession>physicist</profession>
<hobby>playing the bongoes</hobby>
</person>
</people>
a simple XSLT example
the transforming stylesheet
<?xml version = “1.0”?>
<xsl:stylesheet version = “1.0”
xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”></xsl:stylesheet>
• xsl prefix identifies xsl QNames as belonging in the XSLT namespace associated to the given URI
• the empty stylesheet
• contains no template rules
• will apply default rules (see later)
a simple XSLT example
the output of the transform
Alan
Turing
computer scientist
mathematician
cryptographer
Richard
P
Feynman
physicist
playing the bongoes
default behaviour strips out the mark up and returns a text document that reproduces the content of the XML (including whitespace like tabs and carriage returns)
to modify the default behaviour, we add template rules that describe how to transform elements of the source document
template rules• a template rule is defined by an
<xsl:template> element
• the match attribute contains a pattern identifying the input to which the rule is applied
• the content of the element is a template for the output from the matched pattern
<xsl:template match=“pattern”>
template
</xsl:template>
example 2
<?xml version = “1.0”?>
<xsl:stylesheet version = “1.0”
xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
<xsl:template match = “person”>
A Person
</xsl:template>
</xsl:stylesheet>
<?xml version = “1.0”?>
<people> <person born=“1912” died=“1954”> <name><first_name>Alan</first-name>
<last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person> <person born=“1918” died=“1988”> <name><first_name>Richard</first-name>
<middle_initial>P</middle_initial><last_name>Feynman</last_name>
</name> <profession>physicist</profession> <hobby>playing the bongoes</hobby> </person></people>
example 2 output
A Person
A Person
Each person element in the original document has been replaced entirely by the template.
The whitespace outside each person element has been preserved
example 3
<?xml version = “1.0”?>
<xsl:stylesheet version = “1.0”
xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
<xsl:template match = “person”>
<p>A Person</p>
</xsl:template>
</xsl:stylesheet>
elements used in a template must preserve well-formedness of the document
example 3 output
<p>A Person</p>
<p>A Person</p>
The <p> and </p> tags have also been copied over from the template
The whitespace outside each person element has been preserved
xsl:value-of
• xsl element which extracts the string value of an element in the source XML– the string value is the text content after:
• all tags have been removed
• entity and character references have been resolved
• select attribute specifies the element whose value is taken
example 4
<?xml version = “1.0”?>
<xsl:stylesheet version = “1.0”
xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
<xsl:template match = “person”>
<p>
<xsl:value-of select = “name” />
</p>
</xsl:template>
</xsl:stylesheet>
<?xml version = “1.0”?>
<people> <person born=“1912” died=“1954”> <name><first_name>Alan</first-name>
<last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person> <person born=“1918” died=“1988”> <name><first_name>Richard</first-name>
<middle_initial>P</middle_initial><last_name>Feynman</last_name>
</name> <profession>physicist</profession> <hobby>playing the bongoes</hobby> </person></people>
example 4 output<p>
Alan
Turing
</p>
<p>
Richard
P
Feynman
</p>
the full text content of the <name> element after the <first_name>, <middle_name>, and <last_name> tags have been stripped out
The whitespace inside each name element has been preserved along with the rest of the text content
example 4a<?xml version = “1.0”?>
<xsl:stylesheet version = “1.0”
xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
<xsl:template match = “person”>
<p>
<xsl:value-of select = “@born” />-
<xsl:value-of select = “name” />
</p>
</xsl:template>
</xsl:stylesheet>
example 4a output<p>
1912 -
Alan
Turing
</p>
<p>
1918 -
Richard
P
Feynman
</p>
the value of attribute <born> associated with element <person> is added to the output
<xsl:apply-templates>
• xsl element that can affect the default order of processing– which elements should be processed next– process elements in the middle of processing
another element– prevent particular elements from being
processed
• select attribute contains a pattern identifying elements to be processed at that point
example 5<?xml version = “1.0”?>
<xsl:stylesheet version = “1.0”
xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
<xsl:template match = “person”>
<xsl:apply-templates select = “name” />
</xsl:template>
<xsl:template match = “name”>
<xsl:value-of select = “last_name” />,
<xsl:value-of select = “first_name” />
</xsl:template>
</xsl:stylesheet>
example 5 outputTuring,
Alan
Feynman,
Richard
The order of processing has been changed.
The output for each <name> consists of the full text content of the <last_name>, followed by a comma and a new line, followed by the full text content of the <first_name>.
The <profession> and <hobby> elements are never processed because <apply-templates> bypasses them
XSLT processor• a software component that
• reads a XML source document and a stylesheet
• applies the transformation rules
• outputs the transformed document
• standalone• SAXON
• Apache Xalan (used in NetBeans)
• built into a browser or application server• MSXML (built in to IE6)
• Apache Cocoon (built in to Apache server)
Stylesheet Example – XML (catalog.xml)
<?xml version="1.0" ?><catalog> <cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company> <price>10.90</price> <year>1985</year> </cd> <cd> <title>ESSSSSSSSS</title> <artist>Bruce</artist> <country>Uk</country> <company>Cola</company> <price>12.90</price> <year>1988</year> </cd></catalog>
Stylesheet Example –XSL (cdcatalog.xsl)<?xml version="1.0" ?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/"> <html> <body> <h2>My CD Collection</h2>
<xsl:apply-templates/> </body></html>
</xsl:template> <xsl:template match="cd"> <p> Title: <xsl:value-of select="title"/> <br /> Artist: <xsl:value-of select="artist"/> </p> </xsl:template></xsl:stylesheet>
Stylesheet Example –Linking XML to Stylesheet<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?><catalog> <cd> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <country>USA</country> <company>Columbia</company> <price>10.90</price> <year>1985</year> </cd> <cd> <title>ESSSSSSSSS</title> <artist>Bruce</artist> <country>Uk</country> <company>Cola</company> <price>12.90</price> <year>1988</year> </cd></catalog>
My CD Collection
Title: Empire BurlesqueArtist: Bob Dylan
Title: ESSSSSSSSSArtist: Bruce
Stylesheet Example – HTML Output
Summary• Namespaces
– allow elements from different XML languages to be included in same XML
– xmlns:xlink=http://www.w3.org/1999/xlink
• XSLT– <xsl:stylesheet version = “1.0” xmlns:xsl=“URI”/>– Create templates with <xsl:template match=“/”>– Select content with
• <xsl:value-of select =“name” />• <xsl:value-of select =“@id” />
– Control content with• <xsl:apply-templates/>• <xsl:apply-templates select = “name” />
Useful websites• Standards:
– www.w3.org/Style/XSL/- administrates xsl std• www.w3.org/TR/xslt
• www.w3.org/TR/xpath
• www.w3.org/TR/xsl/
• Tutorials/Forums– www.w3schools.com/xsl– www.learn-xslt-tutorial.com/– www.xml.com– www.tizag.com/xmlTutorial/xslttutorial.php