XML DATA TRANSFORMATION
XML DATA TRANSFORMATION
• XML was first introduced as a metalanguage for data
description.
• A metalanguage represents a well-defined interface that
evolves and is transformed into derived languages. XML is
simply the foundation interface for a number of specific
markup languages, each of which is based on its own
vocabulary and schema.
• The schema syntactically differentiates XML languages from
each other. XML is key for data exchange and interoperability,
and the schema is essential for providing XML documents with
a typed and well-defined structure. Unfortunately, in the
imperfect world in which we live, schemas often express the
same semantics through different syntaxes.
• An XML transformation is simply the XML workaround for this
relatively common
situation. An XML transformation is a user-defined algorithm that
attempts to express
the semantics of a given document using another equivalent syntax. A
transformation is much like a type cast in programming. You can
always try to coerce the type, but in
doing so you could face and accept compromises like syntax
adaptations and,
sometimes, loss of data and logic.
• In XML, the transformation process is seen as the application of a
style sheet to the
source document. The style sheet is a declarative and user-defined
document that is
referred to as extensible. The term Extensible Stylesheet Language
(XSL) indicates a
metalanguage designed for expressing style sheets for XML
documents. An XSL file
contains the set of rules that will be used to transform a document into
another,
• XSL files were originally conceived as the XML counterpart of HTML's
cascading style
sheets (CSS). In this context, XSL files were simply extensible and user-
definable tools
to render an XML markup in HTML for display purposes. The growing
complexity of
style sheets, as well as the advent of XML schemas, changed the
perspective of XSL
and led to XSL Transformations (XSLT).
• The goal of XSL has evolved over time. Today, XSL is a blanket
term for a number of derived technologies that altogether
better qualify and implement the original idea of styling XML
documents. The various components that fall under the
umbrella of XSL are the actual software entities that you use in
your code:
• XSLT Rule-based language for transforming XML documents
into any other
text-based format. XSLT provides for XML-to-XML
transformation, which
mostly means schema transformation. An XSLT program is a
generic set of
transformation rules whose output can be any text-based
language,
including HTML, Rich Text Format (RTF), and Wireless Markup
Language
• XPath Query language that XSLT programs use to select specific
parts of an
XML document. The result of XPath expressions is then parsed
and
elaborated by the XSLT processor
• XSL Formatting Objects (XSL-FO) Advanced styling features
expressed by
• an XML vocabulary that define the semantics of a set of
formatting
• elements
XSLT TEMPLATE PROGRAMMING
XSLT is a process that combines two XML documents—the XML source
file and the style sheet—to produce a third document. The resultant
document can be an XML document, an HTML page, or any text-based
file the style sheet has been instructed to generate.
The source document must meet only one requirement: it must be a
well-formed XML
document. The style sheet must be a valid XML document that contains
the transformation logic expressed using the elements in the XSLT
vocabulary. An XSLT
style sheet can be seen as a sequence of templates.
AN OVERVIEW OF THE XSLT PROCESS
• The core part of the transformation process is the application
of templates to XML source elements. Other ancillary steps
might include the expansion of elements to text, the execution
of some script code, and the selection of a subset of nodes
using Xpath queries. The layout of a generic XSLT script is
shown here:
• <xsl:stylesheet
• xmlns:xsl="http://www.w3.org/1999/xsl/transform">
• <xsl:template match="/">
• ⋮
• </xsl:template>
• <xsl:template match="...">
• ⋮
• </xsl:template>
• ⋮
• </xsl:stylesheet>
• The final output of each template must form a syntactically valid fragment in
the target
• language—be it XML, HTML, RTF, or some other language. You are not
required to
• indicate the target language explicitly, although the XSLT vocabulary
provides a tailormade
• instruction to declare what the expected output will be. The main
requirement for
• the XSLT style sheet is that its overall text be well-formed XML. In addition,
it must
• make syntactically correct use of all the XSLT instructions it needs. The
syntax of each
APPLYING AN XSLT TEMPLATE TO SOURCE MARKUP TEXT.
XSLT INSTRUCTIONS
• The XSLT vocabulary consists of special tags that represent particular
operations you
• can perform on the source markup text or passed arguments.
Although the overall
• syntax is that of a rigorous XML dialect, you can easily recognize the
main constructs of a high-level programming language.
• The following subsections summarize the main XSLT instructions you
are likely to run
• across in your XSLT experience. The XSLT instructions are divided
into four categories:
• templates, data manipulation, control flow, and layout.
THE STYLE SHEET IS FIRST NORMALIZED TO AN XPATH NAVIGATOR AND THEN COMPILED
MANAGING THE PROCESSOR'S STATE
• The style sheet compiler populates three internal data
structures with the data read from the source.
PERFORMING TRANSFORMATIONS
FURTHER READING:
• http://www.w3.org/TR/xslt