XML Instructor: Charles Moen CSCI/CINF 4230
2
XML
Extensible Markup Language
A set of rules that allow you to create your own markup language
Designed for delivering data over the Web in text files that are self-describing and readable both by computer programs and by humans
The XML specification has been maintained by the World Wide Web Consortium (W3C) since 1998
XML (Spainhour, Ray, W3Schools)
Example of an XML File
3
<?xml version="1.0" encoding="UTF-8" ?><sandwiches>
<sandwich name="Shrimp Poorboy"> <price>5.99</price> <sandwich> <sandwich name="Grilled Burger"> <price>4.99</price> </sandwich></sandwiches>
XML declaration is always on the first line.
XML uses markup tags, like HTML, but developers can invent their own tag names.
As long as the tags follow the XML syntax rules, we can invent whatever tags and attributes are needed to describe our data.
XML (Spainhour, Ray, W3Schools)
Problem with HTML
4
<h1>Beginning ASP.NET 3.5 in C# 2008</h1><h2>Matthew MacDonald</h2>
It’s difficult to get the meaning of this data by looking at the HTML elements.
XML (Yue)
HTML provides the structure of a Web page, but not the semantic meaning of its content.
<book> <title>Beginning ASP.NET 3.5 in C# 2008</title> <author>Matthew MacDonald</author></book>
XML can provide the semantic meaning through its markup tags.
5
XML is Portable Data
XML files are plain text files that contain markup tags
Any software that can process plain text can read XML• Hardware independent• Software independent• XML can be used to exchange data between incompatible
systems
XML-aware applications• Can process XML data as long as the application “knows” the
meaning of the tags• Meaning of the tags depends on the application
XML (Ding, W3Schools)
6
XML Technologies
XML
XML Namespaces
DTD (Document Type Definition) • For describing your markup language
XML Schema• An XML-based method of describing your markup language
XSL (Extensible Stylesheet Language)• For displaying and transforming XML documents
DOM (Document Object Model)• Object library for manipulating an XML document as a tree
XML (Yue)
7
XML documents must be well-formed
An XML document that conforms to the minimal XML syntax rules is well-formed
Elements must always have a closing tagTag names and attribute names are case-sensitiveElements must be properly nestedAll attributes must have a valueAll attribute values must be surrounded with quotes or apostrophesThe XML declaration is on the first lineThe document has a single root element
XML (Spainhour, Ray, W3Schools)
Root Element
8
<?xml version="1.0" encoding="UTF-8" ?><sandwiches>
<sandwich name="Shrimp Poorboy"> <price>5.99</price> <sandwich> <sandwich name="Grilled Burger"> <price>4.99</price> </sandwich></sandwiches >
Root element
XML (Spainhour, Ray, W3Schools)
The top-level element • Only one• All other elements must be nested within it
In an XHTML document, the root element is <html>
9
Tag Names
There are no predefined tag names; you must invent your own (or use tags that another developer invented)
Should be descriptive, so that the document can be self-describing
Should be short and concise
Can contain letters, numbers, and other characters
Must not start with a number or punctuation character, including the dollar sign, caret, percent symbol, semicolon, etc.
Must not start with the letters “xml”
Cannot contain spaces
Should not contain the characters “:” or “.”
XML (Spainhour, Ray, W3Schools)
10
Element Content
The text between the start tag and end tag
Content can be any of the following:Empty, without content
Nested elements
Character data
Character entities
Processing instructions
Comments
CDATA sections
XML (Spainhour, Ray, W3Schools)
<br />
<sandwich name="Shrimp Poorboy"> <price>5.99</price><sandwich>
< > & " '
<?xml-stylesheet type="text/xsl" href="simple.xsl"?>
<!-- This is a comment -->
<?xml version="1.0" encoding="UTF-8" ?><sandwiches>
<sandwich name="BLT"> <price>5.99</price> <ingredients> <![CDATA[ Bacon, lettuce, & tomato ]]> </ingredients> <sandwich></sandwiches >
11
CDATA
Can be inserted anywhere that character data can occur All characters within a CDATA section are treated as a literal
part of the character data
Begins with these special characters
All characters within are treated as literals and are not parsed as XML
XML (Spainhour, Ray, W3Schools)
Ends with these special characters
12
<sandwich name="Poorboy"/>
Attributes
Attribute
Name-value pair that describes a property of the element
Can be included in the start tag or an empty tag
A particular attribute can appear only once in the same tag
XML (Spainhour, Ray, W3Schools)
13
Validation
A DTD describes your XML markup language Which tags can be used What each element can contain
A document can be tested with the DTD, and if it passes then it is valid• Must be well-formed
• Must be free of mistakes‒ No misspelled tag names
‒ No improper nesting
‒ No missing elements
Important when used by software that expects a particular document structure; and when separate groups of people need to agree on a common language for data exchange
XML (Spainhour, Ray, W3Schools)
14
DTD
Defines the structure or grammar of an XML document by describing your markup language
Used to test whether the XML document is valid
Can be internal or external
Can contain the following types of markup declarations• ELEMENT – the XML elements
• ATTLIST – attributes of the elements
• ENTITY – characters referenced using the “&...;” syntax
• NOTATION – description of the data format
• Processing instructions
• Comments
XML (Yue, Spainhour, Ray, W3Schools)
15
DTD Example
If we want to maintain a phone list as an XML document, the DTD might look like the following:
XML (Yue, Spainhour, Young, W3Schools)
<!ELEMENT phonelist (person)*>
<!ELEMENT person (name,phonenumber)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phonenumber (areacode,number)>
<!ELEMENT areacode (#PCDATA)>
<!ELEMENT number (#PCDATA)>
This DTD defines a phone list that contains the name, area code and phone number of each person in the list.
16
Element Declarations
ELEMENT’s are the “building blocks” of an XML document.
<!ELEMENT phonelist (person)*>
<!ELEMENT person (name,phonenumber)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phonenumber (areacode,number)>
<!ELEMENT areacode (#PCDATA)>
<!ELEMENT number (#PCDATA)>
The first line declares that a “phonelist” element has element content, and it can contain zero or more “person” child elements.
<!ELEMENT phonelist (person)*>
Begins the element declaration
Tag name of this element
Content can be zero or more “person” elements
* Zero or more
+ One or more
? Zero or one
These three characters can be used to specify the
number of elements
Ends the element declaration
XML (Yue, Spainhour, Young, W3Schools)
Element Declarations
17
<!ELEMENT phonelist (person)*>
<!ELEMENT person (name,phonenumber)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phonenumber (areacode,number)>
<!ELEMENT areacode (#PCDATA)>
<!ELEMENT number (#PCDATA)>
The second line declares that a “person” element has element content, and it must contain exactly one of each of the elements “name” and “phonenumber,” in that order.
<!ELEMENT person (name,phonenumber)>
Tag name of this element
When there are multiple child elements with commas separating the names, then the child elements must appear in that specific sequence
XML (Yue, Spainhour, Young, W3Schools)
Element Declarations
18
<!ELEMENT phonelist (person)*>
<!ELEMENT person (name,phonenumber)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phonenumber (areacode,number)>
<!ELEMENT areacode (#PCDATA)>
<!ELEMENT number (#PCDATA)>
The third line declares that the content of the “name” element is simple character data.
<!ELEMENT name (#PCDATA)>
Tag name of this element
“PCDATA” stands for “parsed character data,” text that will be parsed by the XML parser. Tags inside the text will be treated as markup and entities will be expanded. It can also be empty.
XML (Ding,Yue, Young, W3Schools)
Element Declarations
19
<!ELEMENT phonelist (person)*>
<!ELEMENT person (name,phonenumber)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phonenumber (areacode,number)>
<!ELEMENT areacode (#PCDATA)>
<!ELEMENT number (#PCDATA)>
What can you say about the next three declarations?
XML (Yue, Spainhour, Young, W3Schools)
Element Declarations
20
<!ELEMENT phonelist (person)*>
<!ELEMENT person (name,phonenumber)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phonenumber (areacode,number)>
<!ELEMENT areacode (#PCDATA)>
<!ELEMENT number (#PCDATA)>
Is the following XML document valid, according to this DTD?
<?xml version="1.0"
encoding="UTF-8"?>
<phonelist>
<person>
<name>Charles Moen</name>
<phonenumber>
<areacode>281</areacode>
<number>283-3848</number>
</phonenumber>
</person>
</phonelist>
XML (Yue, Spainhour, Young, W3Schools)
Using an External DTD
21
<!ELEMENT phonelist (person)*>
<!ELEMENT person (name,phonenumber)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phonenumber (areacode,number)>
<!ELEMENT areacode (#PCDATA)>
<!ELEMENT number (#PCDATA)>
Use the DOCTYPE instruction to connect the xml document with an external DTD
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE phonelist SYSTEM "phonelist.dtd">
<phonelist> <person> <name>Charles Moen</name> <phonenumber> <areacode>281</areacode> <number>283-3848</number> </phonenumber> </person></phonelist>
XML (Yue, Spainhour, Young, W3Schools)
phonelist.dtd
phonelist.xml
<!DOCTYPE phonelist SYSTEM "phonelist.dtd">
The root element
Describes the location of the DTD, and can be relative or fully qualified, such as:"http://sce.uhcl.edu/moenc/dtds/phonelist.dtd"
Either SYSTEM or PUBLIC (if PUBLIC, then must be followed by both a name and URI)
Using an Internal DTD
An internal DTD is placed in the DOCTYPE instruction of the XML document.
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE phonelist [ <!ELEMENT phonelist (person)*>
<!ELEMENT person (name,phonenumber)> <!ELEMENT name (#PCDATA)> <!ELEMENT phonenumber (areacode,number)> <!ELEMENT areacode (#PCDATA)> <!ELEMENT number (#PCDATA)> ]>
<phonelist> <person> <name>Charles Moen</name> <phonenumber> <areacode>281</areacode> <number>283-3848</number> </phonenumber> </person></phonelist>
XML (Yue, Spainhour, Young, W3Schools)
phonelist.xml
23
More about Element Declarations
ELEMENT content can be specified in several forms.
<!ELEMENT phonelist (listitem)*>
<!ELEMENT listitem (person | department)>
<!ELEMENT department (name,phonenumber)>
<!ELEMENT person (name,phonenumber)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phonenumber (areacode,number)>
<!ELEMENT areacode (#PCDATA)>
<!ELEMENT number (#PCDATA)>
The “choice” form specifies a series of possible child elements
XML (Yue, Spainhour, Young, W3Schools)
The “sequence” form specifies a required sequence of child elements
<!ELEMENT misc ANY>
The “ANY” keyword means the element can have any legal content,
in any order.<!ELEMENT br EMPTY>
The “EMPTY” keyword means the element must have no content.
24
Attribute-List Declarations
All attributes must be explicitly declared with an “ATTLIST” declaration.
<!ELEMENT phonelist (listitem)*>
<!ELEMENT listitem (person | department)>
<!ELEMENT department (name,phonenumber)>
<!ELEMENT person (name,phonenumber)>
<!ATTLIST person title CDATA "Dr" #Required>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phonenumber (areacode,number)>
<!ELEMENT areacode (#PCDATA)>
<!ELEMENT number (#PCDATA)>
Here, the “title” attribute is required; it must be CDATA; and it defaults to “Dr”.
XML (Yue, Spainhour, Young, W3Schools)
<!ATTLIST person title (Dr|Ms|Mr) "Dr">
Here, the “title” attribute is not required; it must be one of the three values that are enumerated; and it defaults to “Dr”.
XML Namespaces
25
<?xml version="1.0" encoding="UTF-8"?>
<uhcl:courses xmlns:uhcl="http://www.uhcl.edu/ns">
<uhcl:course>
<uhcl:title>Charles Moen</uhcl:title>
<uhcl:rubric>CSCI/CINF</uhcl:rubric>
<uhcl:number>4230</uhcl:number>
</uhcl:course>
</uhcl:courses>
XML (Yue, Spainhour, Young, W3Schools)
We can be sure that there is no conflict with element names by using a namespace.
The namespace must be declared before using it, and the declaration is often in the root element.
The identifier must be unique, and is usually a URL. (The URL does not have to be a valid URL of a Web page.)
The qualified element name consists of the namespace, followed by a colon, followed by the local name.
Just for Fun
XSL
26
XML
An XSL (Extensible Stylesheet Language) document can be used to transform the data in an XML document to an HTML document, or a document in some other format.
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="simple.xsl"?>
<phonelist> <person> <name>Charles Moen</name> <phonenumber> <areacode>281</areacode> <number>283-3848</number> </phonenumber> </person></phonelist>
<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head> <title>Demo XSL</title> </head> <body> <h1>Phone List</h1> <table border="1" cellspacing="0" cellpadding="5" width="480"> <tr><th>Name</th><th>Phone number</th></tr> <xsl:apply-templates select="phonelist/person"/> </table> </body> </html> </xsl:template>
<xsl:template match="person"> <tr> <td><xsl:value-of select="@title"/> <xsl:value-of select="name"/></td> <td> (<xsl:value-of select="phonenumber/areacode"/>) <xsl:value-of select="phonenumber/number"/> </td> </tr> </xsl:template></xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head> <title>Demo XSL</title> </head> <body> <h1>Phone List</h1> <table border="1" cellspacing="0" cellpadding="5" width="480"> <tr><th>Name</th><th>Phone number</th></tr> <xsl:apply-templates select="phonelist/person"/> </table> </body> </html> </xsl:template>
<xsl:template match="person"> <tr> <td><xsl:value-of select="@title"/> <xsl:value-of select="name"/></td> <td> (<xsl:value-of select="phonenumber/areacode"/>) <xsl:value-of select="phonenumber/number"/> </td> </tr> </xsl:template></xsl:stylesheet>
The XSL must be linked to the XML
27
References
Ding, Wei, “XML” UHCL lecture slides, 2008.
Ray, Erik T. Learning XML. O'Reilly, 2001.
Spainhour, Stephen and Robert Eckstein. Webmaster in a Nutshell, 3rd Edition. O'Reilly, 2002.
W3Schools Online Web Tutorials. “DTD Tutorial". [Online]. Available: http://www.w3schools.com/dtd/default.asp
W3Schools Online Web Tutorials. "XML Tutorial". [Online]. Available: http://www.w3schools.com/xml/default.asp
Young, Michael J., XML Step by Step. Microsoft Press, 2000.
Yue, Kwok-Bun, “An Introduction to XML” UHCL lecture notes, 2001.